LWN.net Weekly Edition for May 21, 2015

What's new in SPDX 2.0

By Nathan Willis
May 20, 2015

Version 2.0 of the Software Package Data Exchange (SPDX) specification was unveiled on May 13. SPDX is designed to facilitate license-compliance efforts for large projects (or projects that simply include a large number of upstream components), as well as those undertaken by component vendors and product manufacturers. The new revision adds some important flexibility to the format, enabling cross-references between packages and support for packages that are delivered via version-control systems.

The SPDX format is the product of the SPDX workgroup at the Linux Foundation (LF). The version 2.0 page includes the formal specification [PDF] as well as some background material created by the workgroup, such as the requirements document [PDF] that set out goals for the new revision.

Chief among those goals was adapting SPDX to work well as a metadata format that propagates easily through a "supply chain" from one vendor to another. In SPDX 1.x, a single file was used to capture the licensing information for every package. As such packages get incorporated into a larger combined work, however, the information from those files had to be copied into a single replacement file describing the derivative work, because the SPDX 1.x format could not reference external files. The most significant changes in version 2.0 are those that overcome this limitation.

SPDX uses RDF/XML to record several types of metadata about a software package. The focus is generally placed on license information, given the importance of license compliance in open-source software, but the SPDX format incorporates several other sections containing other types of metadata fields, such as a general-purpose package-information section (which includes the version number, the original source of the package, the provider of this particular copy of the program, and so on).

If it is not clear how this information would be of use to a development team, a presentation [PPT] from the 2015 Collaboration Summit includes a real world example for ActiveMQ. The official ActiveMQ packages released by the Apache Software Foundation bundle in Jetty (which itself is copyrighted by the Eclipse Foundation), while Jetty bundles in javax.servlet from Glassfish (which is copyrighted by Oracle). So the SPDX document for ActiveMQ denotes which of the many files in the package are under which license (in this case, Apache 2.0 for ActiveMQ, either Apache 2.0 or Eclipse 1.0 for Jetty, and either CDDL or GPL for javax.servlet) as well as the respective copyright holders. It also concludes that the combined ActiveMQ release is under the Apache 2.0 license. However, noting those per-component licenses is still important because some downstream developers might be interested in using only part of the whole.

Packages, references, and relationships

The biggest change in SPDX 2.0 is that all of this information no longer has to be fused together into a single, massive document. Assuming that the individual components (i.e., javax.servlet and Jetty) ship with their own SPDX 2.0 documents, the SPDX 2.0 document for ActiveMQ can reference the licensing, copyright, and other metadata information from those other SPDX files. This is done with two additions to the format: a globally unique SPDX identifier for each document and an internal identifier for each XML element within the document. SPDX documents can thus reference individual elements inside of other SPDX documents unambiguously. In supply-chain terms, this means that the ActiveMQ project could preserve the SPDX documents that come with Jetty and javax.servlet, then create a shorter SPDX document for its combined release by simply including references to those existing files.

Perhaps a more subtle result of this change is that it is no longer necessary to create a separate SPDX file for each individual software package that a company or vendor releases. Thanks to the ability to unambiguously reference individual elements within a SPDX document, each package can contain more than one top-level "package" element. Of course, since each SPDX 2.0 document can contain multiple packages or can reference packages in other documents, parsing a SPDX file is not as simple as it was in the 1.x days. But such is the price of flexibility.

The format also now allows users to explicitly designate the relationships between various files, packages, and other elements. The relationships supported include simple dependencies, plus "generates" relationships (to, for example, designate that a particular binary file is generated from a specific source file), designations that a file is a test case or data file, notes that a file has been removed or altered from its upstream version, and so on.

It is also worth noting that some of these relationships capture what one might call time-sequential information—like the addition or removal of a file. This is a departure from the SPDX 1.x era, where only the current hierarchical state of the package was represented. The ability to record such time-based information is an aspect of the supply-chain model that the SPDX workgroup hopes to support with 2.0.

The files referenced in a SPDX document can also now include a variety of data formats, such as audio, video, images, documentation files, and generic plain text. Documentation is, of course, often text itself, but the semantic meaning may be valuable to downstream users of a package. A number of new hash algorithms are also supported, so that checksums for different file types can be recorded in the SPDX document, too. The upshot of all of these changes is that SPDX 2.0 can more completely capture the semantic meaning of the files and packages provided by a particular project.

Licenses

SPDX documents contain a section named "Other Licensing Information Detected" for referencing licenses that are not on the SPDX official list. As the Collaboration Summit presentation expresses it, the SPDX license list aims to cover 90% of the world's FOSS code, which it says can be done with subset of the total license ecosystem—about 20 out of the 2000-plus licenses used in the wild. Nevertheless, the format somehow needs to account for that other 10%.

Off-list licenses are described with a set of XML attributes and RDF tag:value pairings. They include the license name, URL, and a text snippet (potentially the entire license). Despite this flexibility, in previous SPDX revisions there was no clearly defined way to express certain complex licensing situations. The most important is when a package is released under a choice of licenses (i.e., dual-licensed or tri-licensed). The 2.0 specification attempts to standardize how such licensing information is recorded by defining an augmented Backus-Naur Form (ABNF) syntax for expressing compound licensing relationships.

Like other elements in a SPDX 2.0 document, the off-list licenses described can be referenced elsewhere in the file by their internal ID, and cross-referenced by other SPDX documents.

And more

Among the other noteworthy changes to SPDX 2.0 is an extension of the Package Download Location field within the package-information section. Starting with this version, the field can contain a reference to a version-control system (VCS), whereas earlier versions expected an HTTP URL. The VCS references supported include Git, Mercurial, Subversion, and Bazaar, using HTTPS or SSH transport URLs. The syntax is taken from that used by the Python Package Index, and includes support for denoting specific branch, sub-path, commit hash, and tag names.

Finally, one feature from SPDX 1.x has been deprecated in SPDX 2.0. A new "Annotation" section replaces the "Review Information" section of earlier versions. This now-deprecated section was used to record human review information: the reviewer, review date, and an optional comment. All of that has been replaced by a more general-purpose annotation system. The format is essentially the same; an annotation has a "annotator" field, plus a date and room for comments. The new wrinkle is that Annotations can reference any SPDX element by its identifier. Thus, they can note per-element changes, rather than being attached only to the whole package (as was the case in SPDX 1.x).

What's next

It will be interesting to see how SPDX 2.0 catches on with users, specifically within corporate software environments (who tend to want far more detailed provenance about the components they use than do volunteer projects). There is a lot of additional flexibility in SPDX 2.0, but that flexibility does mean that SPDX users will have to restructure the metadata files for their packages. For large and complex software projects, the complexity of the new format compounds the amount of work required.

It is also interesting to note what did not make it into SPDX 2.0. In particular, the requirements document noted earlier mentions a desire to increase the granularity with which metadata can be recorded. Per-file licensing is too coarse for some projects, so there was interest in accommodating more finer-grained components (e.g., functions or classes). This does not appear to have made it into the 2.0 release. Nor did one other proposed requirement: a mechanism to verify the creator and reviewer information listed in the SPDX metadata for a package. Presumably, such a mechanism would resemble the one already used to provide a means for verifying file integrity—by, say, including GPG signatures—but the feature was tabled. The SPDX workgroup indicates that the standard will not be sitting still, however, so perhaps these and other features will soon be implemented in a new update.

Comments (13 posted)

New projects from day two of CoreOS Fest

May 20, 2015

This article was contributed by Josh Berkus

CoreOS Fest

While day one of CoreOS Fest 2015 introduced CoreOS architecture, plans, and specifications, day two introduced multiple open-source projects and tools. Presentations showed systemd-nspawn, Project Calico, Sysdig, and others. Most of these projects have been in development for a year or more, but the talks at the conference were the first look for most attendees.

While the talks themselves were interesting, the most remarkable thing was the sheer number of new tools that have been developed in the last year or so. Building up the software scaffolding for Linux containers seems to have happened faster than many other major changes introduced in Linux. One of the most fundamental pieces of this new infrastructure is systemd — the new init system for Linux — with its support for containers "out of the box".

Systemd and CoreOS

Lennart Poettering of Red Hat gave a presentation on systemd and CoreOS, describing the systemd tools that integrate with container management. Building containers using systemd really displays its benefits compared with other init systems, according to Poettering; systemd supplies all of the tools required to manage diverse containers on a single machine.

He noted that his talk was not an official Red Hat presentation, but then spent a fair amount of time speaking for the systemd team at Red Hat. The team isn't a product team, he explained, "we consider ourselves more of a research department than people who work on products."

This attitude explains some of the design decisions, such as choosing Btrfs as the primary filesystem for systemd-nspawn templates and containers. "Btrfs has a reputation for instability, but [the Btrfs project] is trying to solve fundamental filesystem issues," he said. Also, he explained that it is acceptable for containers to run on an unstable filesystem because "they're not where the data is." Important user data should be stored in external volumes, not in the container.

Systemd has multiple daemons that support containers, including systemd-machined, systemd-networkd, and systemd-resolved. In general, all of systemd is container-compatible because, according to Poettering, "systemd is tested on containers more often than on bare metal". Using containers allowed him to test init without rebooting his laptop frequently. He sees this deep integration with containers as a vital feature for Linux; "containers should be part of the OS itself, like Solaris Zones are."

It is also the goal of his team to be container-agnostic, supporting not just rkt, but also Docker, libvirt-lxc, OpenVZ, and others. The idea is that while systemd supplies a lot of container utility, it should be a low-level building block and not provide a sophisticated user interface. Projects like CoreOS and Kubernetes can then use systemd's functionality for basic operations.

Systemd-machined and its command-line tool, machinectl, are the most obvious piece of container management in systemd. With machinectl, users can list, start, stop, and even login to containers interactively. Systemd-machined is "really just a registry of containers" with which any container can register. Further, it can be used together with systemd to run any command inside a container using "systemd-run -M". Systemd-machined also allows running containers to appear in ps command listings and in GNOME's system monitor.

Systemd-nspawn is a lightweight container executor that provides a Docker-like tool that can start and run containers. It can be used to start a container using any filesystem or block device containing an MBR or GUID partition table. For users who want a limited-feature container manager that requires no configuration, systemd-nspawn will be an attractive option. Rkt uses systemd-nspawn under the hood to run container instances.

Systemd-networkd and systemd-resolved, the network and host-name-resolution daemons of systemd, also support containers. Systemd-networkd will automatically start a container's networking and do internal DHCP address assignment. Systemd-resolved provides host names for containers, using "link-local multicast name resolution", or LLMNR, an automatic name-discovery system invented by Microsoft. While LLMNR was designed for client applications and mobile devices, it can be used by containers to find each other on the network.

Based on Poettering's presentation, it seems like systemd will offer a strong alternative to Docker's libcontainer and other container initialization and management tools. Since the systemd tools will be built into most versions of Linux, they will eventually be widely available by default in many user environments. Perhaps that's why so many of the companies in the container business are focused on orchestration, which is one area where systemd doesn't concern itself.

Go and containers

One of the things that CoreOS, Inc. CEO Alex Polvi announced during his keynote was the company's sponsorship of the second Gophercon, the conference for Go language programmers. In fact, if you look at the list of sponsors for Gophercon, you'll see six of the major container-promoting companies listed there, which is around a quarter of the overall sponsors. This is not a coincidence; both CoreOS, Inc. and Docker, Inc. use Go almost exclusively. "Etcd could only have been built with Go," said Polvi.

In almost every talk and meeting room at the container conferences I've been to, people are talking about, and coding in, Go. Docker is written mostly in Go. Etcd, fleet, Swarm, Kubernetes, Kurma, and many other utilities and daemons for containers were built with it. The rise of Linux containers as a platform is likely to also be the rise of Go as a language.

Go started at Google in 2007 as an internal project with three developers, and today has over 500 contributors both inside and outside Google. The project is open source under the BSD license, but it is still run by Google staff and contributing requires signing a Contributor License Agreement (CLA) to Google. Increasingly, Go is used as an "automation language" for scalable server infrastructure; prior to Linux containers, it was popular for implementing network proxies, cloud server management tools, distributed search engines, and redundant data stores. So it's perhaps unsurprising that container utility programmers should also have chosen the language.

Because of CoreOS's close ties with Go, Brad Fitzpatrick gave a general session on Go's continuous build infrastructure. Fitzpatrick, known for LiveJournal, memcached, and OpenID, is now on the Google Go team. He presented at the conference on the automated build infrastructure that is used to test the language on many possible platforms. It started out as a Google App Engine application, plus a chain of mobile devices on Fitzpatrick's desk, and grew. His talk covered some of the history and mechanics of how it works.

Since Go is a compiled, rather than interpreted, language, it's critically important for users to know that binaries will execute on different platforms. Every check-in of Go gets built on hundreds of platform variations in a large machine lab at Google. Containers play a minor part in this because so many of the platforms to be tested don't support, or work with, containers. Linux variants are tested using Docker, but operating systems like Mac OS X and Android need special-purpose hardware to test them. You can see the current build test status and which builds are broken for various platforms on the Go Dashboard.

Project Calico

While Project Calico has been open source for almost a year, it was new to most of the audience when core developer Spike Curtis presented it. Calico is multi-host network routing software that includes a distributed, per-service firewall. It is designed for containers and virtual machines, especially Docker and OpenStack environments. The project is written in Python and developed by Metaswitch Networks, which is currently Calico's only commercial support vendor. Calico looks like a potential solution for users who want to deploy containers in production, but have stringent security requirements.

"Remember three-tier architectures?" complained Curtis. "That's still how admins secure networks. You have your external network, your DMZ with web resources, and your data layer, which needs to be the most secure."

"Microservices" running in containers on an orchestration network break down this three-tier model. First, microservices are defined by what service they provide, rather than their security characteristics. Second, orchestration frameworks expect an undifferentiated data center network and aren't designed with the concept of security tiers. Most of all, microservices require defining security policies and zones for literally hundreds of entities, instead of the few dozen network administrators expect. As he described, "it's a zoo and you've torn down the walls."

However, microservices offer a security opportunity as well. Because each one only does one thing, you can characterize its security requirements in simpler terms. This means that services can be compartmentalized in a more sophisticated way without added complexity, and that's what Project Calico is designed to do.

With each microservice or container mapped to a single IP address, Calico implements a simple iptables-based firewall running on each physical host for each of those IP addresses. Each service is defined by tags stored in etcd, and a JSON-formatted configuration file defines which other services are allowed to connect to it — or if it's available to the Internet.

Project Calico is designed to integrate with any orchestration framework that supplies an IP address for each service. Curtis demonstrated using Calico with Kubernetes, including using an extended Kubernetes pod definition to define security settings for each container. Apache Mesos is currently working on the IP-per-service feature, so it doesn't work with Calico yet.

Sysdig

The final "new" project described at CoreOS Fest was Sysdig. Like Project Calico, it was released about a year ago but most attendees saw it there for the first time. Also, like Project Calico, Sysdig is backed by a single company, Sysdig Cloud, which offers commercial support for the tool. Loris Degioanni, CEO of Sysdig Cloud, presented the tool at CoreOS Fest.

Sysdig is a traffic-monitoring system that is partially implemented as a Linux kernel module. The module captures all network traffic on the system, especially traffic between containers. The Sysdig tool supports writing filters in Lua (called "chisels") for this information, which allows users to aggregate it for statistical analysis. It can be thought of as a more advanced version of wireshark and tcpdump combined with container-awareness.

Degioanni said that Sysdig is an improvement on the Google cAdvisor project — frequently used with Docker containers — because cAdvisor only tells you about overall CPU, memory, and network usage of containers. Sysdig also gives you the ability to distinguish the endpoints and content of traffic. This means that you can, for example, filter for certain database queries, or troubleshoot unusual lag between two specific IP addresses.

One of the things Degioanni demonstrated was the soon-to-be-released open-source curses-based user interface for Sysdig, which is intended to allow system administrators to do interactive monitoring over SSH. He showed how to dig into traffic between containers and summarize it, as well as how to look into network delays. At the Sysdig Cloud booth, its staff showed off a much fancier, proprietary graphical user interface that supports clicking through to nested layers of servers, pods, and containers.

Day two wrap up

The new projects, tools, draft standards, and architectures I learned about at CoreOS Fest showed the rapid pace of development in the Linux container world. A year ago, when I reported on the first DockerCon, most of the techniques and tools covered at CoreOS Fest had just been launched or didn't even exist. Next year, we will see if development is still so high-velocity.

Of course, there's one major topic we haven't yet covered: the ongoing issue of storing persistent data in containers. As mentioned above, there is currently an expectation that containers are stateless and do not keep data. Removing that expectation raises a number of problems for container management and orchestration that are only beginning to be addressed, such as management of external volumes, container migration, and load-balancing of stateful services. Join us next week for coverage of multiple topics related to persistent data and containers from both CoreOS Fest and Container Camp.

Comments (9 posted)

Font editing with Glyphr Studio 1.0

By Nathan Willis
May 20, 2015

Glyphr Studio is a free-software font-editing application that runs inside a browser window. Version 1.0 was released on May 7. Although it is not yet a full-featured application, it does provide a good interface for the core tasks of font development.

The Glyphr Studio code is hosted at GitHub. The application is licensed under GPLv3, and is based on a number of open-source JavaScript libraries. At the moment, it can only be run from a web server; the latest release is accessible online at glyphrstudio.com/online. That means one must have an active Internet connection to work, of course. There is a feature request open to package the application for offline usage with Node.js, but it is not clear how far away such a change might be.

The project has been in development since 2010 (although it was a private, personal project in its early days), and has regularly made public beta releases since 2013. The most recent of those releases, Beta 5.2, arrived in January 2015. Subsequently, developer Matt LaGrandeur announced a roadmap in which the 1.0 milestone would designate that import/export functionality had landed, thus making the application useful for basic work. To implement that functionality, LaGrandeur chose to use the existing OpenType.js library. As it turns out, integrating OpenType.js proved to be such an easy transition that the 1.0 release also incorporated several other useful features.

Production

Glyphr Studio 1.0 offers a vector-based drawing canvas on which users can create the glyphs in a font, tools for constructing ligatures and setting up basic kerning, an interface for configuring basic font metadata, and a playground for testing out the in-development characters on some sample text. Users can also import generic SVG shapes, so they can design glyphs in another application first and bring them into Glyphr Studio for further refinement.

As far as the import/export functionality is concerned, at the moment it is basic. Users can import SVG fonts (as distinct from the generic SVG files that can be pulled in as drawing elements), OpenType .OTFs, and TrueType .TTFs. Export is a tad more limited. The program can export SVG fonts or OpenType .OTFs, but those .OTFs will not store any of the kerning or ligature tables. This is a limitation inherited from OpenType.js; when that project adds the requisite features, Glyphr Studio will pick them up. SVG fonts can be exported with both kerning and ligature tables.

Glyphr Studio can also save projects in its own, JSON-based format—files that will be saved to the user's local machine, not kept on the server, so there is little risk of data loss while one waits for OpenType.js to mature. In practice, the bigger limitation is that Glyphr Studio cannot import or export the Unified Font Object (UFO) file format used by many other tools, but there are feature requests open for that, too.

That said, there are a variety of font-engineering tools that can operate directly on OTF files, including scripts to insert kerning and ligature tables. These tables store relatively simple information (e.g., a kerning pair consists of two references to glyphs and an integer saying how much the space between those glyphs should be adjusted when they appear in order). It might not be trivial, but one could probably use Glyphr Studio in conjunction with some command-line utilities to develop and release a fully functioning text font.

Usage

The ability to technically produce a valid font is not what most people will be interested in, however. Designing and testing all of the glyphs that go into a usable finished product is a labor-intensive task that can be quite repetitive. Where such an interactive application often succeeds or fails is on the ease-of-use of its editing interface and how well its tool set stands up to the user's requirements and expectations.

The Glyphr Studio interface does a good job of being easy to navigate and unobtrusive. It can be run in a single browser window or the editing canvas can be split out into a window of its own to provide maximum editing space while still providing access to the on-screen settings and tools. Within the editing canvas, the tools and navigation buttons (e.g., zoom and pan) are straightforward enough. The navigation and tool panels (i.e., the parts of the interface outside of the editing canvas) are clear and well-organized, too, providing one-click access to each of the major tasks.

As with all modern font editors, the emphasis is on creating and adjusting Bezier contours. In my tests—most of which were conducted with Glyphr Studio 1.00 running in Chromium—I encountered some oddities. The curve-adjustment tools occasionally threw a surprise at me, such as the handles for a control point disappearing when I selected a different point. I am also not quite convinced that I like the interface's habit of making options disappear entirely when they are not needed (rather than simply graying them out). This behavior makes the other options and buttons jump around on the screen, which seems like it would impede one's ability to commit the UI to muscle memory.

There are also isolated parts of the interface that are more difficult to navigate than others merely because they reuse icons and symbols from other places in the UI. Presumably that can be fixed. In addition, there are a few places where features seem to be only partially implemented—such as layer support, where the buttons to move an object up or down in the layer stack are present, but there is no way to add layers other than the default layer itself.

On the other hand, Glyphr Studio deserves high marks for laying out options and controls in a manner that relates directly to the workflow that a user will employ. For example, as noted earlier, the navigation panel includes buttons for each of the major tasks in designing a font (at least, those currently supported in Glyphr Studio): glyph editing, ligatures, kerning, and test-driving. That more or less reflects how most type designers seem to operate. Those four tasks are interrelated; users tend to jump back and forth between them, particularly when designing something from scratch.

In contrast, FontForge—which, for many years, has been the dominant free-software font editor— also offers kerning and ligature construction, but the tools used to work on those tasks are accessible only from within a menu or buried several levels deep in modal dialog boxes. FontForge's functionality is more complete, but it can be quite a bit harder to use. On a daily basis, the inconveniences can wear some users out.

The one genuinely original feature that Glyphr Studio 1.0 offers is its "components" support. Components were a surprise addition to the 1.0 release when the OpenType.js integration went faster than expected. In essence, a component in Glyphr Studio is a reusable shape that the user can draw once and then reference in as many glyphs as necessary. The simplest example might be an accent mark, which readers expect to look the same on a variety of letters, but it can be anything: the "feet" on certain Cyrillic letters, the lower-case "n" reused as the left half of "m", and so on.

Most type designers seem to employ some form this technique already (though informally; perhaps by putting the reusable component in a slot assigned to no Unicode point), so it is interesting to see it elevated to the level of first-class feature. Whether it attracts and retains new users will be interesting to watch.

The last word

Two weeks ago, we reported on the Libre Graphics Meeting session in which Dave Crossland opined that Glyphr Studio and some other web-based font-development tools would soon surpass FontForge in functionality. Glyphr Studio 1.0 does not do that, although it does raise the bar in terms of usability, and it provides a serious challenge to FontForge going forward.

That is not to say Glyphr Studio has a clear road ahead of it; as the conventional wisdom says, implementing the initial, basic functionality is often easy for a development team. Filling in that "last 10%" is what adds unexpected complexity and ends up taking 90% of developers' time. In the meantime, though, Glyphr Studio is a worthy project for those in graphic-design fields to pay attention to. Regardless of what the future holds, another quality free-software option for font developers is a win for the community.

Comments (5 posted)

Page editor: Jonathan Corbet

Inside this week's LWN.net Weekly Edition

Security: Logjam; New vulnerabilities in gnutls, moodle, php, wireshark, ...
Kernel: Delay-gradient congestion control; SYN packet fingerprinting; Clear Containers.
Distributions: A preview of Fedora 22; PC-BSD 10.1.2, CentOS 7 for AArch64, ...
Development: PostgreSQL: the good, the bad, and the ugly; Rust 1.0; 20 years of Qt; Twisted 15.2; ...
Announcements: LibrePlanet videos, articles from Opensource.com and Linux.com, events, ...

Next page: Security>>