Leading items

Systemd and containers

By Jonathan Corbet
June 10, 2015

As Lennart Poettering noted at the beginning of his LinuxCon Japan talk, containers are "all the rage" these days. He was there to talk about how systemd works with containers — on both the inside and the outside. The systemd developers believe that containers should be a core part of an operating system, a feature that is provided out-of-the-box, rather than being an add-on. The container concept should be integrated all the way through the operating system. This idea, Lennart admitted, is not new; it also characterized the Solaris Zones implementation.

Systemd development is also driven by the idea that the operating system running inside a container should be as similar as possible to that running outside. Applications should not care whether they are running inside a container or not; the operating system should hide the differences between those two settings. Lennart called this concept "integrated isolation." There is isolation in that the container should appear to be its own host, but there should be full integration with the real host; a system-log query, for example, should be able to return results from the host, from one or more containers, or all of the above.

The Docker system, Lennart said, wants to have a single tenant running in each container. Systemd takes a different approach. Daemons that run on the host, for example, should generally run within the container as well.

There are a number of other overriding principles that drive systemd container development. The core system should be minimal, without a lot of options for users to wade through — but the door should be open for alternatives when the need arises. The focus should be on getting the low-level parts right. "No hacks"; the implementation should be clean. That is why Btrfs is used with native systemd containers rather than adding the logical volume manager to the mix. New standards should not be created when standards already exist; thus, for example, systemd doesn't include its own packaging system since Docker and rkt already exist. Most systemd testing is done inside containers; that simply makes life easier, since it's no longer necessary to reboot the system for each testing cycle. Finally, systemd is not focused on cluster-wide orchestration; instead, it provides the building blocks that allow others to create solutions at that level.

The core of the systemd approach to containers is a daemon called systemd-machined; it serves as a registry for containers running on the system. Other tools that have container awareness integrate with systemd-machined to get their job done. Thus, the systemctl command, when given the -M option, will run a command in an arbitrary container. The -r option causes services running in containers to be listed as well. There is a list-machines subcommand to create a list of containers along with some health status. Other command-line tools, including loginctl, systemd-run, and journalctl, also understand the -M option. The machinectl command-line tool can be used to log into or stop a container. And so on.

Work has been done to integrate with other system tools as well. There is an option to the ps command to see which container owns any given process. Gnome-system-monitor has been extended in similar ways. The name service switch (NSS) subsystem has been enhanced with a module to do hostname resolution for containers. The sd-bus IPC library is container-aware. Importantly, all of these mechanisms work with the various container-management systems available on Linux; they are not limited to systemd's containerization mechanism.

That mechanism is built around systemd-nspawn, a simple container manager. The rkt container system was built around systemd-nspawn at the lowest levels. It is a minimal system which, naturally, integrates with systemd-machined. All it needs is a directory full of operating-system files and it will be able to boot from there. It can also boot raw disk images, the same ones that work with KVM or on bare metal. So, for example, Fedora Cloud images work with systemd-nspawn.

It can also be set up to provide "containers as a service." Each container can be a normal systemd service, meaning that all of systemd's resource-management and process-management features are available.

One of the newer tools is systemd-networkd, added about one year ago. It adds a network-configuration mechanism that, Lennart said, is "minimal" but "really powerful." A systemd-networkd daemon running on the host will, when a virtual Ethernet tunnel appears for a container, pick an IP range for the container, run a DHCP server for it, set up IPv4LL, set up network address translation, etc. When running in the container, instead, it will launch the DHCP client and perform IPv4LL setup. The end result is that networking for containers just works with no additional configuration needed.

Another recent addition is systemd-resolved, which does central hostname resolution. It is a centralized daemon, avoiding many of the hassles that come with NSS resolution; in particular, it maintains a single, global cache. There is no longer a need to run a resolver in each application which, among other things, should reduce the attack surface of the system. Lookups with LLMNR are set up by default.

The systemd-import tool can import and export container images. It understands simple formats like tarballs and raw disk images; it can also import from (but not export to) the Docker format. This tool is typically used via the machinectl pull-raw command to get container images off the net. These images can then be booted directly with systemd-nspawn.

Lennart closed the session with a brief discussion of the concept of stateless systems. These systems have everything they need to boot in /usr; other directories, including /etc and /var, are created on the first boot. With such systems, a factory reset is easily achieved by removing everything except /usr. To achieve a truly stateless system, one just has to always boot in this manner. Stateless systems are useful for containers, since it is possible to instantiate a lot of them and have them set themselves up on the first boot. A single /usr image can be used as a golden master image, making for easy centralized updating. The stateless systems idea is relatively new, but the systemd developers are hoping to push distributors in that direction.

Lennart's slides [PDF] are available for the curious.

[Your editor would like to thank the Linux Foundation for funding his travel to LinuxCon Japan]

Comments (54 posted)

Remote vehicle interaction in automotive Linux

By Nathan Willis
June 10, 2015

ALS 2015

At the 2015 Automotive Linux Summit in Tokyo, a team from Jaguar Land Rover (JLR) gave attendees the first look at the company's plans for implementing remote vehicle interaction (RVI) in Linux-based automotive systems. RVI can encompass a wide array of features that use long-range networking, but JLR's project focuses on just a few: controlling simple mechanical systems with mobile apps, remote data logging, and updating car software over-the-air (SOTA).

The session was co-presented by Matt Jones and Arthur Taylor. Jones noted that there is an RVI expert group within the Automotive Grade Linux (AGL) project and that several GENIVI members have been participating as well. The implementation presented in the session is thus something of a joint effort—JLR being an active member of GENIVI and AGL. The expert group's mission statement is to specify, plan, and build a reference implementation of the infrastructure required for RVI. But, he added, the group had no plans to stop development after it completes support for its initial three RVI use cases. "We'll keep adding things as new use cases are presented. Please bring us your ideas."

The group is also committed to building a working implementation of the entire framework, Jones said. "We don't believe it's enough for this to be a computer science project." A real-world implementation, he said, has to be secure, thoroughly tested, and reliable enough that system manufacturers are willing to integrate it. Thus, the group has developed its implementation in the open; the code is available on GitHub.

The first piece of the system is a server component designed to run in a data center (not in a car). It receives, processes, and forwards messages between vehicles and other software services. On the vehicle's side, there is a corresponding client daemon. But, Jones pointed out, the framework does not require setting up a direct client-to-server link. Rather, the network model uses a store-and-forward design, so that it can cope with poor or intermittent network connections. Vehicles can also relay messages in a peer-to-peer fashion, which is particularly useful for fleet vehicle deployments.

The server supports the creation of basic user and group accounts that can be used to authenticate incoming messages and check that any included commands are authorized. It also tracks vehicles by name, make, model, and vehicle identification number (VIN). How this factors into authentication is most easily seen in the group's "control" demo, which provides an app for Android and iOS that can be used to unlock the car door. Each user account on the server can have multiple cryptographic keys, each of which is bound to a specific mobile device running the app.

Using a web application, the user can grant each key the "lock/unlock" permission for a specific vehicle ahead of time. When the app sends the unlock command to the server, the server checks that the originating device has the correct permissions before it relays the command to the car. For testing, Jones said, the unlock mechanism was a Raspberry Pi wired into an electrical relay on the door lock. The team is currently extending this system to support the control of additional features, with heater/air-conditioning control coming first.

The second use case tackled by the group was remote data collection. This is a somewhat simpler application, as it involves only sending data packets from the vehicle to the RVI server, where the data is logged in a database. The example implementation uses an off-the-shelf on-board diagnostic (OBD-II) dongle to capture status messages, storing the collected data in a Cassandra database.

Unlike the door-locking test, which is a purely internal project, there has already been outside interest in the data-logging feature. The Portland public transportation system got in touch with the group to ask for help developing an open-source solution to collecting data for its bus system, Jones said. A solution for Portland is still in development, but the project recently completed its first data-logging field test, collecting one month's worth of data from a taxi fleet in San Francisco, and developing web applications to plot the data historically as well as provide a live overview of the fleet's current status. Taylor demonstrated both web applications.

The final use case targeted by the team is SOTA, the over-the-air software update functionality, which Jones said can apply to several different types of updates. Manufacturers may want to deliver an update to a specific subsystem, third-party developers will periodically want to update their applications, carmakers may need to issue a critical security patch, and so on. In addition, he said, consumers have gotten used to the idea of upgrading their PC and phone operating systems periodically, so he expects that car owners will want the same feature—few car owners will be content to run a ten-year-old OS when all of their other electronic devices can be updated in the field.

So far, the team has implemented per-package updating. The RVI server notifies the car computer that an update is available and sends a network location from which the car can request the update when it is ready. The car asks the user to authorize the update, after which it downloads the new package. Currently, both RPM and binary diffs are supported package formats. The package sent by the server is first encrypted using a per-vehicle key pair to guard against man-in-the-middle attacks. Jones noted that software key pairs and hardware security modules are both supported by the framework. If the car computer decrypts and validates the package, it can install it and report success back to the RVI server; if not it reports the failure.

From the client's perspective, this is fairly straightforward (not too different from the way packages are updated on other Linux systems). The RVI server has some more complicated matters to keep track of, however. It can queue updates for multiple vehicles, set time windows to open and close access to the update system, restrict access by geographic location, stagger releases out by VIN ranges, and so on.

For now, the system stores the package updates in the RVI server database itself, although plugins to support GitHub and other packages sources are in development. Jones noted that the server can also deliver full disk images, although this has not been rolled out.

All three frameworks are, for now, fairly simple in scope. But they will get more complicated as the project progresses. The system is already modular: a number of database backends are available for the RVI server, and additional services can be added. Jones said that although the project itself is open source, it has been designed so that proprietary plugins will be possible, since some carmakers or software developers are likely to expect them. In response to an audience question, he said that the server is written mostly in Erlang, while the demo applications are primarily in Python, and the web front-ends (such as the unlock-permission application) are based on Django.

Security, of course, is a significant concern. Jones pointed out that the RVI server and client both use standard Linux security modules, leaving the details up to the implementer. Thus, since Tizen IVI uses Smack for access control, its client-side lock/unlock support would use Smack to restrict access to the door-lock service. Another system might use an entirely different configuration.

Another audience member asked how the unlock system would cope with the user's phone being stolen. Jones replied that the user can go to the web site and de-authorize the phone—which makes it more secure than physical car keys, which can also be stolen. Someone in the crowd then asked what would happen if the phone was stolen "in the desert" where there was no Internet access—to which Jones replied it would be much like having one's keys stolen in the desert.

The notion of RVI has long been the source of some unease for a portion of the developer community, perhaps because of the periodic news stories highlighting how non-secure contemporary car-computing systems are. But it is worth noting that this is an area where Linux and open-source software have a chance to improve on the status quo, rather than sink to its level. The AGL expert group still has a lot of work ahead of it to implement a full-fledged RVI system, but it is certainly good to see that it is taking multiple use cases, inconsistent network access, and security in mind from the beginning.

[The author would like to thank the Linux Foundation for travel assistance to attend ALS 2015.]

Comments (4 posted)

Resurrecting the SuperH architecture

By Nathan Willis
June 10, 2015

LinuxCon Japan

Processor architectures are far from trivial; untold millions of dollars and many thousands of hours have likely gone into the creation and refinement of the x86 and ARM architectures that dominate the CPUs in Linux boxes today. But that does not mean that x86 and ARM are the only architectures of value, as Jeff Dionne, Rob Landley, and Shumpei Kawasaki illustrated in their LinuxCon Japan session "Turtles all the way down: running Linux on open hardware." The team has been working on breathing new life into a somewhat older architecture that offers comparable performance to many common system-on-chip (SoC) designs—and which can be produced as open hardware.

The architecture in question is Hitachi's SuperH, whose instruction set was a precursor to one used in many ARM Thumb CPUs. But the patents on the most important SuperH designs have all expired—and more will be expiring in the months and years to come—which makes SuperH a candidate for revival. Dionne, Landley, and Kawasaki's session [PDF] outlined the status of their SuperH-based "J2" core design, which can be synthesized in low-cost FGPAs or manufactured in bulk.

Dionne started off the talk by making a case for the value of running open-source software on open hardware. That is a familiar enough position, of course, but he went on to point out that a modern laptop contains many more ARM and MIPS processors than it does x86 processors. These small processors serve as USB and hard-drive controllers, run ACPI and low-level system management services, and much more. Thus, the notion of "taking control of your hardware" has to include these chips as well.

He then asked what constitutes the minimal system that can run Linux. All that is really needed, he said, is a flat, 32-bit memory address space, a CPU with registers to hold instructions, some I/O and storage (from which the kernel and initramfs can be loaded), and a timer for interrupts. That plus GCC is sufficient to get Linux running—although it may not be fast, depending on the specifics. One does not even need a cache, floating-point unit, SMP, or a memory-management unit (MMU).

At this point, Landley chimed in to point out that Dionne had been the maintainer of uClinux, which was an active project maintaining Linux on non-MMU systems up through 2003, when Dionne handed off maintainership to others where, unfortunately, development slowed down considerably. The requirements for running Linux are quite low, though; many of the open-hardware boards popular today (such as the Raspberry Pi) throw in all sorts of unnecessary extras.

That brings us to SuperH, which Dionne said was developed with a "massive research and development outlay." The SuperH SH2 was a highly optimized design, employing a five-stage Harvard RISC architecture with an instruction-set density considerably ahead of its contemporaries. That density is a common way to measure CPU efficiency, he explained; a dense architecture requires fewer instructions and thus fewer clock cycles to perform a given task. Most of a CPU's clock cycles are spent waiting for something, he said; waiting for instructions is such a bottleneck that if you can get them fast enough, "it almost doesn't matter what your clock speed is."

The SuperH architecture is so dense that a 2009 research paper [PDF] plotted it ahead of every architecture other than x86, x86_64, and CRIS v32. ARM even licensed the SuperH patent portfolio to create its Thumb instruction set in the mid-1990s.

Fortunately, the patents are now expiring. The last of the SH2 patents expired in 2014, with more to come. The SH2 processor was, he said, used in the Sega Saturn game console; the SH4 (found in the ~~Sony~~ Sega Dreamcast) will have the last of its patents expire in 2016. Though they are older chips, they were used in relatively powerful devices.

In preparation for this milestone, Dionne, Landley, and others have been working on J2, a clean-room re-implementation of the SH2 that is implemented as a "core design kit." The source for the core is written in VHDL, and it can be synthesized on a Xilinx Spartan6 FPGA. The Spartan6 is a low-cost platform (boards can be purchased for around $50), but it also contains enough room to add additional synthesized components—like a serial controller, memory controller, digital signal processor, and Ethernet controller. In other words, a basic SoC.

The other main advantage of the J2 project is that the work for implementing SuperH support is already done in the kernel, GCC, GDB, strace, and most other system components. By comparison, there are a few other open CPU core projects like OpenRISC and RISC-V, but those developers must write all of their code from scratch—if the CPU core designs ever become stable enough to use. As Landley then added, "we didn't have to write new code; we just had to dig some of it up and dust it off."

The project has thus "inherited" an excellent ISA, and has even been in contact with many of the former Hitachi employees that worked on SuperH. But that is of little consequence if a $50 FPGA is the only hardware target. The Spartan6 is cheap as FPGAs go, but still more than most customers would pay for an SoC. So the J2 build chain not only generates a Xilinx bitstream (the output which is then synthesized onto the FPGA); it also generates an RTL circuit design that can be manufactured by an application-specific integrated circuit (ASIC) fabrication service.

Chip fabrication is not cheap if one shops around for the newest and smallest process, Dionne said—but, in reality, there are many ASIC vendors who are happy to produce low-cost chips on their older equipment because the cost of retooling a plant is exorbitant. A 180nm implementation of the J2 design, he said, costs around three cents per chip, with no royalties required. "That's disposable computing at the 'free toy inside' level."

As of today, the J2 is sufficient to build low-end devices, but the roadmap is heading toward more complex designs as more SuperH patents expire. In 2016, the next iteration, called J2+, will add SMP support and an array of DSPs that will make it usable for signal-processing applications like medical devices and Internet-of-Things (IoT) products like the oft-cited home electricity monitor. A year or so further out, the J4 (based on the SH4 architecture) will add single instruction, multiple data (SIMD) arrays and will be suitable for set-top boxes and automotive computing.

Landley and Dionne then did a live demonstration, booting Linux on a J2 core that they had synthesized onto an off-the-shelf Spartan6 board purchased the day before in Tokyo's Akihabara district. The demo board booted a 3.4 kernel—though it took several seconds—and started a bash prompt. A small victory, but it was enough to warrant a round of applause from the crowd. Dionne noted that they do have support in the works for newer kernels, too. Landley said that he was still in the process of setting up the two public web sites that will document the project. The nommu.org site will document no-MMU Linux development, he said (hopefully replacing the now-defunct uClinux site), while 0pf.org will document the team's hardware work.

In an effort to reduce the hardware cost and bootstrap community interest, the team is also planning a Kickstarter campaign that will produce a development board—hopefully with a more powerful FPGA than the model found on existing starter kits—in a Raspberry-Pi–compatible form factor. By including a larger FPGA, these boards should be compatible with the J4 SMP design; the Lx9 version of Spartan6 (which was used for the J2 development systems) simply does not have enough logic gates for SMP usage.

At the end of the talk, an audience member voiced concern that SuperH was old enough that support for it is unmaintained in a lot of projects. He suggested that the J2 team might need to act quickly to stop its removal. Landley noted that, indeed, the latest buildroot release did remove SuperH support, "but we're trying to get them to put it back now." Luckily, Dionne said, there are other projects keeping general no-MMU support in the kernel up-to-date, such as Blackfin and microblaze. The team has been working on getting no-MMU support into musl and getting some of the relevant GCC build tools "cleaned up" from some minor bit rot.

Another audience member asked whether or not the SuperH ISA was getting too old to be relevant. In response, Dionne handed the microphone over to Kawasaki, who had remained off to the side for the entire presentation up to that point. Kawasaki was one of the original SH2 architects and is now a member of the J2 project. There have been some minor additions, he said: the J2 adds four new instructions. One for atomic operations, one to work around the barrel shifter, "which did not work the way the compiler wanted it to," and a couple that are primarily of interest to assembly programmers. There are always questions about architecture changes, he said, but mostly the question is whether to make the changes mandatory or simply provide them as VHDL overlays. For the most part, though, the architecture already had everything Linux needs and works well, despite its age.

As of today, the nommu.org site is online and has an active mailing list, although the Git repository Landley promised is not yet up and running. The 0pf.org site is also up and running, and contains much more in the way of documentation. While the project is still in its early stages, it seems to be generating considerable interest, and with several more iterations of open CPU designs still to come.

[The author would like to thank the Linux Foundation for travel assistance to attend LCJ 2015.]

Comments (41 posted)

Page editor: Jonathan Corbet
Next page: Security>>