LWN.net Logo

FOSDEM: Configuration management

February 16, 2011

This article was contributed by Koen Vervloesem

The 2011 FOSDEM conference had a Configuration and Systems Management developer room on its second day. This first meeting about configuration management and automation with open source tools was organized by the people from Puppet Labs and had a focus on Puppet, but other tools like Chef and Cfengine were also discussed.

Configuration management is about establishing and maintaining consistency of a system throughout its life. For software, this means that the system has to track and control all configuration changes, which can be the contents of files in /etc, the installation of specific packages, file permissions, users, and so on. Having a configuration management tool for your systems is useful in a lot of ways: you can automatically repair a system's configuration after a failure, you can easily reproduce a specific configuration on another system, you can audit changes, and, if you pair the configuration management system with a version control system like Git, you can always return to a known-good configuration if things go wrong. Where configuration management systems really shine is when you have a large number of systems networked together: by automating the configuration, you save the system administrator's time and you're sure that all systems are configured consistently.

The big three configuration management systems for Linux are Puppet (used by Red Hat, Citrix, and the Los Alamos National Laboratory), Chef (used by Engine Yard, 37signals, and Scribd), and Cfengine 3 (used by Facebook, AMD and the Joint Australia Tsunami Warning Centre). Puppet and Chef are broadly similar in architecture, but Puppet has a language designed specifically for the task of describing resources, while Chef is using the general-purpose programming language Ruby to configure resources. Also, Chef seems to be more aimed at developers that want to deploy their web applications, and it doesn't support as many platforms as Puppet does. Cfengine is the grandfather of these configuration management systems (with Cfengine 3 as a total rewrite); one of its advantages is its lower memory footprint and higher performance than Puppet and Chef, but in recent years its popularity has declined. Other configuration management systems that were present in the developer room are FusionInventory, GLPI, and OPSI.

A meta-distribution

In his case study about Linux system engineering in air traffic control, Stefan Schimanski showed how scalable Puppet really is and how it can guarantee reliable mass deployment of the Linux-based, mission critical applications needed in air traffic control centers. Air traffic is growing yearly, so the number of computer systems that have to handle these flights is also growing, as is the work load for the system administrators. Moreover, the systems really need 24/7 365 high-availability: if they go down for 30 minutes, air traffic control has a really big problem. For example, if a computer in a control center freezes, the operator is essentially blind.

These strong requirements coupled with the growing number of servers mean that air traffic control centers need automatic installations of every system with minimal downtime and fast rollbacks. Moreover, all informal requirements documents, described by non-technical people, should be converted into formal specifications of the configuration of the system, to be able to standardize the systems and make their configuration reproducible. Therefore, Schimanski rethought his system engineering approach in 2010 and turned to Puppet.

One thing that Puppet makes easy is distinguishing between the abstract requirements and the concrete implementation. For each node, the system administrator can define how the node has to be configured in an abstract way, e.g. by including classes for a desktop node, a server node, a webserver node, and so on. By reading these node definitions, you can easily see what the node is supposed to be doing, without having to bother with the concrete implementation, which is written in separate files for these classes. For example, the webserver class installs and configures Apache and also includes the configuration of the server class. Moreover, according to Schimanski a good Puppet configuration introduces traceability, which is essential in that kind of environment: "If someone asks where requirement #91 of the requirements document is implemented, it's easy to point out the Puppet code that implements this."

Another interesting idea that Schimanski introduced in his talk was the concept of a meta-distribution: the air traffic control systems are implemented as SUSE Linux Enterprise and Red Hat Enterprise Linux servers, but the Linux distribution itself is completely interchangeable. The AutoYaST or Kickstart files of the installation are minimal, and almost all configuration is done in the form of Puppet modules, e.g. for NTP and other services. The result is a heavily customized enterprise Linux distribution, but all these customizations are documented in a completely formal way. Schimanski explains the rationale behind this approach:

We don't want to depend on one operating system, so if, hypothetically, Novell stops the development of SUSE Linux Enterprise, we could migrate our systems to Red Hat Enterprise Linux or even Ubuntu Server in only four days without redoing all the configuration work.

To a certain degree, Puppet modules can be written in an operating system independent way. There are always some minor differences, such as where the distribution puts its configuration files, but this can be abstracted away with variables that get their value (e.g. the file path) depending on the operating system. Of course you have to check these little things before migrating to another operating system, so it's not effortless, but according to Schimanski, Puppet makes migrating a lot easier.

The Puppet ecosystem

The talks also showed that there is a nice ecosystem of tools developing around Puppet. For example, Henrik Lindberg gave a demo of Geppetto, a new Eclipse-based project developing tools to simplify the process of authoring and using Puppet manifests and modules. The near-term objectives of the project are flattening the learning curve for new Puppet users, supporting best practices, and encouraging the sharing of Puppet modules. Under the hood, Geppetto has a grammar for the Puppet DSL (Domain Specific Language), written with Xtext. Thanks to Xtext, this also automatically results in an Eclipse editor that knows the Puppet language and offers syntax coloring, code completion, code folding, and syntax errors and warnings. Moreover, when creating a Puppet module you can enter metadata and choose dependencies, and at the end you can export the module to a zip file which can be uploaded to the Puppet Forge. The Geppetto integrated development environment can be downloaded as a stand-alone product for Linux, Windows or Mac OS X, or as a separate plug-in for Eclipse.

Another rising star in the Puppet ecosystem is Foreman, presented by its creator Ohad Levy, who joined the ranks of Red Hat in August 2010 as a principal software engineer in its cloud team. This project is now a year and a half old and has 20 contributors, but according to Levy, Foreman will at some point be part of Red Hat's cloud portfolio. Foreman integrates with Puppet and acts as a web based dashboard for it, providing real time information about the status of hosts based on Puppet reports, statistics, and so on. Moreover, Foreman takes care of the low-level details of setting up machines and installing the Puppet client on them, until Puppet is able to take care of the configuration defined in your Puppet modules. It even supports creating virtual machines using the libvirt API, with RHEV-M and Amazon EC2 support in the works. The largest installation managed by Foreman that Levy knows about is running 4000 active hosts. This is clearly a project to watch, as it is backed by Red Hat and it has the potential to make managing an environment with Puppet a lot easier.

Configuration management is not only useful for system administrators installing servers, but also for developers setting up their development environment. Gareth Rushgrove talked about using configuration management tools to get new employees up and running quickly with a development virtual machine. Especially interesting was his coverage of Vagrant, a tool for automated virtual machine creation for Oracle's VirtualBox. Using automated provisioning of the virtual environments using Puppet or Chef, developers can get a complete development environment up and running in no time. Users can configure Vagrant to forward ports to the host machine, to configure shared folders, and so on. It's also possible to package an environment in a distributable box, and rebuilding a complete environment from scratch or tearing down the environment when you're done is possible with a single command. Normally users start by downloading a base box to use with Vagrant (the default one is Ubuntu Lucid Lynx), but they can also build their own base box with a tool like VeeWee.

Lessons for disaster recovery

While Puppet clearly was the most visible configuration management system at FOSDEM, it was not the only one. Joshua Timberman, Sr. Technical Evangelist at Opscode (the creators of Chef), gave a short "Chef 101" talk, followed by an overview of how to use Chef to deploy applications with nothing but the source code repository and data about the application configuration. Traditionally, one deploys applications with tools like tar, rsync and (in the Ruby world) cap deploy, but what do you do then with the server configuration, like that needed for web servers, load balancers, database servers? Timberman showed how you can easily deploy web applications with their corresponding servers using various server roles configured in Chef cookbooks. The Chef server itself is a lightweight Ruby on Rails application, and the largest Chef deployment that Timberman knows about has 5000 nodes checking in to the Chef server each 30 minutes.

The first talk of the day was by Nicolas Charles and Jonathan Clarke who presented their use of Cfengine in their company Normation and focused on their experiences with disaster recovery. All their services (web, email, Git repository, Redmine, ...) were running on one hosted server. This used a three-disk RAID5 array, with daily backups, separate virtual machines for each service, and all services automatically installed and configured using Cfengine 3.

When two hard drives failed simultaneously, they first thought this would be easy to repair, as they had backups and used a configuration management system. However, it seemed they had forgotten some things. For example, they hadn't automated nor made a backup of the configuration of the virtual machines, so these had to be re-created manually. Moreover, after watching all the services coming back online with the right configuration thanks to Cfengine 3, they saw that they had to manually restore the backups, after which they saw that a couple of files were missing. The three big lessons here are: don't forget to describe your virtualization setup in your configuration management system, tie in your configuration management system to your backup tool, and always test your backups.

The system administrator as glue

The best quote that summarized the don't reinvent the wheel approach of configuration management came from Levy's talk: "Automate as many processes as possible, using best practices where available, and act as the glue between the gaps." In this regard, it is interesting to know that everyone can share their Chef "cookbooks" (packages of "recipes") on cookbooks.opscode.com, and Puppet users can share their Puppet modules on the Puppet Forge. This is great for new users who can research the modules of other users and reuse them in their own infrastructure. Your author had already automated some of the services on his home network with Puppet, and this configuration management track at FOSDEM was inspiring enough to continue this approach and decrease the amount of glue in his network.


(Log in to post comments)

FOSDEM: Configuration management

Posted Feb 17, 2011 13:23 UTC (Thu) by pcampe (guest, #28223) [Link]

Thanks for the very informative article.

It's a bit strange that, as system administrators, we have bullet-proof tools for fault management and performance monitoring (the latter at least if you work with a black-box model for a system, performance tracking at transaction level is an entirely different beast) and on the configuration management side we are at the very beginning of a widespread adoption of these or other tools.

I think that configuration management is not something you need when you have thousands of maybe almost identical servers, but an essential foundation for your infrastructure; especially because you can describe a good part of a system administrator's work as "something that changes the configuration of a system", so managing the configuration will help a lot in managing, staffing and evaluating system administrators.

FOSDEM: Configuration management

Posted Feb 18, 2011 3:31 UTC (Fri) by pabs (subscriber, #43278) [Link]

I wanted to deploy Puppet at $work but writing my own set of modules is a huge brick wall that I was just too lazy to get through. I hope that Puppet Forge gives rise to a standardised collection of portable modules that is shipped with Puppet itself.

FOSDEM: Configuration management

Posted Feb 18, 2011 17:08 UTC (Fri) by droundy (subscriber, #4559) [Link]

I can only agree that puppet is a royal pain to use. It seems to be designed for people who manage thousands of computers as their full-time job. For those of us who just manage a dozen or so computers as a peripheral aspect of our job, it's a whole lot of work to set up.

FOSDEM: Configuration management

Posted Feb 25, 2011 22:20 UTC (Fri) by filteredperception (subscriber, #5692) [Link]

I looked at puppet years ago and came to that same conclusion. I've continued work on my own system (viros.sf.net), targeting users with even a smaller base of systems, i.e. a random engineer with a couple personal laptops/workstations and servers. Mainly from my own obsession with easy personal, and business disaster recovery, which is not so unrelated to the process of having to move to a new major linux distro version every 6 months. In any event, I did enjoy (and will link to) this excellent article on the current state of the art. While I still like my metaphor- OS as a virus/dna, configuration classes/modules as genes/traits with subtraits, I am glad to see that all the systems are basically headed down similar paths. I would probably choose puppet for a new 1000+ system deployment, suffering its apparently still steeper-than-ideal learning/complexity curve.

FOSDEM: Configuration management

Posted Feb 26, 2011 2:05 UTC (Sat) by nwmcsween (guest, #62367) [Link]

I worked on creating a provider for a package manager with puppet, I stopped when I saw a variables being used as either boolean or a string... don't use puppet the architecture is broken.

FOSDEM: Configuration management

Posted Feb 19, 2011 0:47 UTC (Sat) by gerdesj (subscriber, #5446) [Link]

Well - thanks a lot. I've been using Puppet for ages and its really good but I'd never heard of cfengine or chef until now. They are both also in Portage, so now I have to eval. them as well!

Actually I wont bother - but its nice to have a choice ...

When you have a system like Gentoo to worry about, being able to offload stuff with Puppet makes life so much easier. You get the power of choice with Gentoo and the predictability that a thorough Puppet implementation provides.

I have rather a lot of web 'n' email proxies to look after, some of which are well over 5 years old and yet run the latest code without a reinstall or nasty "upgrade". I also have quite a stringent dev -> test -> production process but it works very well and the Gentoo unpredictability snag gets offset nicely.

I even use it to upgrade VMs from x86 to amd64. Which is working OK so far 8)

Cheers
Jon

FOSDEM: Configuration management

Posted Feb 19, 2011 1:16 UTC (Sat) by gerdesj (subscriber, #5446) [Link]

Sorry, forgot to say - belting article. As usual, LWN scores highly.

Copyright © 2011, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds