New etcd, appc, and Rocket releases from CoreOS

February 4, 2015

This article was contributed by Josh Berkus

CoreOS has become "the other Linux container startup", rivaling Docker Inc. in both advancing and controlling the specification for the rapidly evolving container-based deployment and cloud ecosystem. As part of promoting its platform and projects, CoreOS holds monthly Meetup events at Rackspace's Geekdom shared office in San Francisco. The CoreOS Meetup on January 27 turned into a release party for two CoreOS projects, etcd and the appc specification. CoreOS staff, project contributors, and a high-profile user explained what was in the new versions, as well as what each project was working on.

etcd 2.0

CoreOS CTO Brandon Philips started things off by announcing etcd 2.0, which was released on January 28. Version 2.0 includes multiple advancements, including backup and restore of the data stored in the cluster, substantial stability improvements, new configuration tools, and a bootstrapping mode for creating new clusters. As LWN explained in a previous article, etcd is a fault-tolerant, consistent, durable, distributed key-value store. CoreOS created etcd in to provide a shared configuration for a large server cluster.

Etcd 2.0 became a release candidate on December 18 and had over a month of testing. As this is the first "stable" version, project members were concerned that it be relatively bug-free and that the APIs be stable hereafter with minimal breakage.

After briefing attendees about how etcd works, Philips explained the jump in version numbers, since the previous released version of etcd was 0.4.6. The REST API for etcd, which is its primary interface, also carries a version number that was already version 2. CoreOS staff felt that it would be confusing for users to have an API version higher than the software version, so they skipped 1.0 completely.

The project is now two years old, and has received contributions from 140 people. As a project, etcd has been incorporated into many other products and projects, including Mailgun's vulcand load-balancer, the confd distributed configuration file tool, distributed Git servers, Google's Kubernetes orchestration manager, Apache Mesos, and Yodlr (see below).

The most user-visible new feature of etcd 2.0 is the new administration commands. First, "etcdctl backup" and "etcdctl restore" allows using files to backup and restore the data from a running etcd cluster cleanly and safely. Second, "etcdctl member" commands permit users to add and remove nodes from their etcd cluster without the need to change configuration files and restart the cluster.

The new version also implements a new "proxy mode" for etcd nodes. This allows users to add additional etcd servers that do not participate in consensus and failover, but instead just mirror the data available through the main nodes. This feature supports much larger etcd clusters with high read loads, such as when etcd is being used to support an infrastructure of hundreds or thousands of containers.

Less visible to users but even more important are the stability and data-integrity improvements. First, the project improved the Raft algorithm [PDF] implementation, which supports etcd server consensus and leader election, by borrowing some ideas from the CockroachDB project. The project also changed the way etcd's Write Ahead Log (WAL) is used, both to support backup, and to prevent certain kinds of data-corruption failures.

"Filesystems truncate and corrupt data," explained Philips. "We used to rewind the log, which would cause failures when the filesystem did something unexpected. We also added checksums to the log."

Because misconfiguration of a cluster is easy to do, the team added UUIDs to identify both individual nodes and the etcd cluster. This UUID is now used in every API request, in order to make sure that nodes don't attach to the wrong cluster or peer with the wrong node.

Kelsey Hightower, a contributor to etcd and a CoreOS staff member, then presented etcd's new bootstrapping features. Previously, one of the major problems was that an etcd cluster required at least two nodes to operate, but you couldn't configure etcd nodes to communicate until you had a cluster, creating a "catch-22". Version 2.0 implements a new "bootstrapping mode" that allows a new cluster to come up and establish peering.

This bootstrapping has three modes: static, DNS, and discovery-based. Static mode just uses command-line switches to tell each node what cluster to join. DNS mode uses SRV records from the DNS server to inform each etcd node of its initial cluster membership.

CoreOS's preferred mode is discovery-based, where each node is given an URL from which to obtain an initial token, and then peers with other nodes that have that token. The URL is that of a single-node etcd server, run just for the bootstrapping process. While users can run their own, CoreOS runs a public discovery key server at discovery.etcd.io in order to eliminate a step.

The etcd 2.0 launch finished with a brief presentation by Ross Kukulinski, founder of Yodlr, on using etcd to build a web application. Yodlr is a new live chat and voice collaboration tool that was created by the training team of a large company for its internal use. When customers became more interested in the chat tool than in the training, the team had to scale out the service quickly. Etcd was indispensable in coordinating user sessions across multiple servers.

The appc Standard and Rocket

Jonathan Boulle, a senior engineer at CoreOS, explained the App Container specification (appc). Both this specification, and the rkt or "Rocket" container runtime were released as version 0.2.0 on January 23. The appc team hopes that this means a stable version of the standard will be released next.

The purpose of appc is to create a universal standard for application containers, to allow them to be implemented in ways that are vendor-independent and OS-independent. Currently, the specification is supported by corporate partners Mesosphere and Pivotal, but most development and revisions are still written by the CoreOS staff.

This specification covers the four main components of how applications should be run in containers:

Image Format: specifies the structure of the image file for the guest runtime environment. This is simply a tarball containing a root filesystem and a JSON-format manifest, as well as an image identifier.
Image Discovery: specifies a federated namespace for image names, which use an URL-like structure.
Executor: specifies how the runtime environment for applications works, including the handling of filesystem mounts and environment variables.
Metadata: specifies how each executor and container offers metadata, including a container ID and Hash-based message authentication code (HMAC) key.

The new 0.2 release of the specification now includes discovery authentication for secure service discovery on shared networks. It also includes HMAC signature validation for containers, which has moved to the SHA512 algorithm in order to take advantage of processor acceleration.

Appc has also inspired a few non-CoreOS projects and implementations. One such is Jetpack, a FreeBSD application container executor created by a Polish team. There is also libappc, a C++ library for working with containers, and docker2aci, a tool for converting Docker images to appc format.

The primary implementation of appc remains rkt, or "Rocket", which is CoreOS's own container runtime, as previously covered in LWN. Rocket is the demonstration implementation of "stage 1" from the appc specification, including the container format and metadata.

The primary difference between Docker and Rocket from a user perspective is that Docker runs as a system daemon that handles all container management, while Rocket is implemented as a ~~library~~ standalone binary. The idea of Rocket is a minimal implementation that makes use of the host OS's init system and tools. In the CoreOS distribution, this means using systemd to manage containers. Also, because Rocket is a ~~library~~ standalone program, it relies on file-based locking rather than using a lock daemon.

Version 0.2 implements three new commands: "status" to get the status of running containers; "enter" to attach the terminal to a running container; and "gc" to perform garbage collection of dead containers. The new version also implements public key validation for trusted repository container images, which works in much the same way as the keys for Apt repositories on Debian.

Boulle also discussed and demonstrated what's currently in development for version 0.3.0. This includes an "rkt trust" command for easy key validation and improved support for group permissions and non-systemd init systems. Rocket 0.3 will also support secure image hosting via Quay.io, one of CoreOS's commercial services.

Having discussed the features, Boulle then went over some of the areas of Rocket and appc that still need work.

Rocket leverages systemd heavily, but has systemd from CoreOS bundled into the rkt binary because of varying OS-level support for systemd. While this makes Rocket easy to install currently, it causes serious problems with Linux packaging systems. Systemd needs to be decoupled, which will also make it possible to swap execution environments for Rocket.

Second, networking is still incomplete. Appc specifies a rule of "one IP address per container", but not how that IP address is to be obtained. Currently the team is working on a plugin-based system in order to support multiple ways of allocating IPs. This is part of the specification in active development.

The third major issue is that Rocket doesn't yet have any tools to build images. The plan is to have a tool, which is completely separate from the Rocket runtime, that builds images according to the appc specification. While there are several ways to create a root filesystem using Linux tools, most Rocket users create images by converting Docker images.

Conclusion

CoreOS, etcd, appc, and Rocket seem to have a strong development momentum, with rapid releases and a lot of new features and products. While the Docker/CoreOS split originally looked like it might be an iceberg in the path of Linux containerization, instead it seems to be driving intense innovation in more directions than could have been embraced by the Docker team itself. Regardless of the success of any of the individual projects, Linux users (as well as FreeBSD and illumos users) have new and rapidly evolving options for container-based deployment. No matter how it works out, it will be exciting to watch.

Index entries for this article
GuestArticles	Berkus, Josh

New etcd, appc, and Rocket releases from CoreOS

Posted Feb 5, 2015 18:29 UTC (Thu) by philipsbd (subscriber, #33789) [Link] (1 responses)

Thanks for writing the article.

One little clarification: the etcd team wrote the new raft library for etcd 2.0 then later when the CockroachDB team was researching how to implement their "multi-raft" library we ended up working with them to share the raft library across projects. The use cases of etcd and CockroachDB are fairly different but since we had implemented a clean and well-tested raft state machine we were able to share that library quite nicely.

New etcd, appc, and Rocket releases from CoreOS

Posted Feb 5, 2015 18:33 UTC (Thu) by jberkus (guest, #55561) [Link]

Brandon,

Sorry about that; I misunderstood what was said about CockroachDB in the session. It sounded like you said that you'd received improvements from CockroachDB rather than the other way around.

New etcd, appc, and Rocket releases from CoreOS

Posted Feb 8, 2015 14:58 UTC (Sun) by kleptog (subscriber, #1183) [Link] (7 responses)

I was curious about etcd/fleetd but was put off by the fact that internal security is practically non-existent. AFAICS any container using etcd can read/write the config of any other container using etcd. That's not really acceptable in some situations.

Has this changed recently?

New etcd, appc, and Rocket releases from CoreOS

Posted Feb 8, 2015 18:54 UTC (Sun) by raven667 (subscriber, #5198) [Link]

I doubt it but I don't think this kind of security is in-scope because containers in general are not currently suitable for multi-tenancy, their usefulness right now and with Docker or Rocket in particular, is in software lifecycle management, making it easy to install and uninstall cleanly on your machines. So if I were using this in production I'd create separate clusters for Internet facing apps and Intranet facing apps or wherever you have a hard security boundary based on the data being processed (PCI or HIPPA or trade secrets or whatever).

If someone breaks into your app server which is handling sensitive data then they probably have access to your sensitive data and etcd security is the least of your problems.

New etcd, appc, and Rocket releases from CoreOS

Posted Feb 8, 2015 21:15 UTC (Sun) by jberkus (guest, #55561) [Link] (5 responses)

Well, that depends on what you mean by "security".

I asked about security during the session. I was told that, while there's not password and user security for etcd, they have implemented full SSL security, including client machine certificates. So you can make sure that only machines you authorize can connect in the first place, and connections are encrypted.

If you think about it, this makes a lot of sense for etcd, since for systems using etcd, passwords are part of the etcd payload, so if you required user/password information to connect to etcd, you'd have a catch-22. etcd isn't a general-purpose key-value store; it's specifically meant for system configuration information.

I didn't report on this in the article above because I wasn't aware it was a change from 0.4. Is it?

New etcd, appc, and Rocket releases from CoreOS

Posted Feb 8, 2015 23:41 UTC (Sun) by dlang (guest, #313) [Link] (4 responses)

That's a good first step, but the parent poster was talking about a problem one layer higher. If you use etcd to manage your datacenter, can someone who breaks into a webserver use the connectivity to etcd to create an account on your database server for them to use?

If any machine can submit changes that affect other machines, that's a problem. If there is something in place to let a machine only change it's own data (with some sort of permission system to allow more, or only the master box can allow more), then it would not be a problem. Personally, I think a mode like this should be the default, but even if the default is wide open but can be locked down, it's acceptable.

If you can't lock it down, you would need to have a separate etcd setup/farm/cluster/whatever for each group of servers that you consider different from a security point of view (which could easily mean that all machines on the same network can be part of the same etcd setup in some networks)

New etcd, appc, and Rocket releases from CoreOS

Posted Feb 9, 2015 0:20 UTC (Mon) by jberkus (guest, #55561) [Link] (3 responses)

As far as I know, etcd has no permissions system. Hopefully one of the actual contributors will speak up here.

New etcd, appc, and Rocket releases from CoreOS

Posted Feb 9, 2015 8:18 UTC (Mon) by kleptog (subscriber, #1183) [Link]

dlang got it right. I have a real need for something like etcd but the lack of a security system means it won't fly in my situation. Hence I continue with my hand-made hand-managed kludge, which does work.

I've thought about adding it myself to etcd, but ISTM that if even basic permissions are not there yet then it looks like the developers don't consider security important. Which means I'd need to audit the entire codebase, as well as any new release when I wanted to upgrade. My kludge isn't that bad yet.

Which is unfortunate. The scene with containers (Docker, Rocket) is moving fast now and has the potential to make things a lot easier for everyone, yet no-one appears to consider security even slightly interesting. (See also Docker not verifying downloaded images).

With etcd, you wouldn't even have to go far. Suppose you could limit all writes to a handful of IP addresses you've addressed 99% of the problem. What remains is "is it a problem that you can read the configuration of other machines" which isn't a problem in many situations. Even now you could build something on etcd with encryption/signing to turn a compromise into merely a DOS.

Security isn't hard or complicated, you just have to consider it when building things.

Actually, the thought just occurred to me that you could layer The Update Framework on top of etcd. They've looked hard that the problem of secure updates, delegation, recovering from compromise, etc. Interesting weekend project.

New etcd, appc, and Rocket releases from CoreOS

Posted Feb 12, 2015 3:35 UTC (Thu) by krakensden (subscriber, #72039) [Link] (1 responses)

Does Consul (one of the closest competitors)? They talk a good game about symmetric keys, but it's hard to tell.

New etcd, appc, and Rocket releases from CoreOS

Posted Feb 12, 2015 7:52 UTC (Thu) by kleptog (subscriber, #1183) [Link]

Thanks for the tip. Looking at the documentation they don't talk about it, but because they use plain HTTP you could stick a proxy in front of it (say nginx) to restrict PUT/DELETE to particular IPs. They support a recurse option, so you'd have to find a way to restrict that.

etcd also uses HTTP, but the certificates part might make it hard to MITM it like that.

Still, new possibilities...