A distributed lock manager for OpenStack?
On the first day of the Tokyo OpenStack Summit, there was a potentially contentious topic discussed in the Design Summit: should OpenStack adopt a single distributed lock manager and, if so, which should it be? The cross-project session was broken up into two parts, the first of which targeted the first question; the second would then look to the implications of that decision. The discussion and decision provided an interesting look into some of the inner workings of the project.
Hot on the heels of the October 15 release of OpenStack Liberty, the developers gathered in Tokyo October 27–30 to determine what would be in the next release, Mitaka, which is due in April 2016. But the summit is also an opportunity to look at longer-term changes that will come in releases over the next year or two. Mike Perez, who is the cross-project developer coordinator at the OpenStack Foundation, moderated the two sessions that, apparently, were not quite as contentious as perhaps was feared.
The overall problem has been summarized in a document: "Chronicles of a distributed lock manager". There is a need for various OpenStack components to perform some operations atomically, which generally means some kind of locking solution is required. Because OpenStack is a distributed system, though, a distributed lock manager (DLM) is needed. Currently, each sub-project has dealt with the problem on its own, typically by storing a lock in its database.
The proliferation of these ad hoc solutions is becoming a problem for the overall project. In addition, there are other sub-projects that would like to have some kind of locking, but would rather not create their own. That led to the idea of choosing a DLM to ship with OpenStack that sub-projects could rely upon being present. That immediately leads to a second question: which?
There are various options for a DLM that are laid out in the Chronicles document. As might be guessed, each has its strengths and weaknesses. The discussion mostly focused on three: Apache ZooKeeper, etcd, and Consul. Each brings additional features that will be of use to some sub-projects, such as leader election and service discovery.
There was some discussion of various sub-projects and their requirements, such as for the Cinder block storage component, the Ironic bare-metal provisioning handler, and the Heat orchestration system. There were obvious parallels between each project's needs, with many needing service discovery and leader election as well as shared locks. The Chronicles document looks at even more of the sub-projects; there were a few more added to the Etherpad notes from the sessions.
One of the main questions is whether operators of OpenStack clouds would "vomit" if they were required to install a specific DLM. An informal straw poll of those in the room found that each of the major options had some opposition. While ZooKeeper has the most features, there were a number of concerns around it, largely because of its implementation language: Java. There are operators who do not want to add the Java Virtual Machine (JVM) into their operations, so the decision comes down to a "Java vs. non-Java" question (both etcd and Consul are written in Go).
But fair locks (ones that prevent starvation) can only be implemented with ZooKeeper, so there was a question about whether that feature was needed. So far, at least, there are no sub-projects that require fair locking, but it certainly seems like something that may be needed down the road. Restricting the project to a solution that cannot provide fair locking struck some as short-sighted. Others noted that there would be a chance to re-address the question in six months, since only one or two projects (likely Cinder and, possibly, Ironic) would have switched to anything new.
There was a suggestion that instead of choosing one DLM, the project could adopt an abstraction layer, perhaps one based on the optional OpenStack Tooz library. That would allow those who wanted a different DLM to run it with a driver to present the common API. There was a mixed reaction to that idea as some clearly felt that an opinionated choice should be made. OpenStack Foundation Director of Engineering Thierry Carrez said that if one DLM was picked, the overall sense of the room seemed to be for ZooKeeper.
But running ZooKeeper on the JVM from the OpenJDK project was of concern to some. Most run ZooKeeper with the Oracle JVM, so there may be problems that occur with OpenJVM—problems that might not be addressed quickly by the ZooKeeper upstream. Running the Oracle JVM is a non-starter for some operators, however. In addition, ZooKeeper isn't really a DLM, but is a toolkit for building a DLM, one attendee noted, which may make it hard for others to replicate the DLM that was built and tested by OpenStack.
On the other hand, though, maintaining an abstraction layer for each DLM choice would be a burden on the project. In addition, there are going to be quirks for each one and it would better to design around the quirks of one, rather than three (or more). But others noted that OpenStack would likely only build one (for ZooKeeper) and that others would need to fill in the abstraction layer for DLMs of interest to them.
There is an established pattern in OpenStack of having abstraction layers and being inclusive, one attendee said. But there are major advantages to having at least one DLM available, rather than having zero as it is today. So it makes sense to focus on having at least one DLM available.
Carrez said that he had come into the session thinking that a choice for a single DLM should be made but, at the end, he was convinced that an abstraction layer was the right approach. That seemed to be agreeable to most in the room (who represented multiple sub-projects and project constituencies). It was also agreed that the default would be ZooKeeper.
After a short break, with some participants having to head off to other sessions, the implications of the decision to have an abstraction layer were discussed. First off, there were some thoughts presented about how components like Ironic could be upgraded in place from their existing database locks to something DLM-based, with minimal downtime. The basic problem is in how to migrate an existing lock from the database to the new scheme without losing track of it during the upgrade phase. Ironic developers seemed confident they had an approach that would work.
Using Tooz as the DLM abstraction layer seemed the obvious approach, but there are some problems with the existing drivers for Tooz. For example, the database driver can't actually provide what the Tooz API promises, so it needs to be removed. A SQL database cannot handle some of the DLM failure modes, so it would look like it was providing DLM functionality, when it actually cannot. Similarly, the interprocess communication (IPC) driver may not be able to faithfully implement the API.
There is a question of how to decide which drivers will be accepted into Tooz. The concern is that some DLMs might have drivers written, but that the underlying DLM cannot truly fulfill the requirements in a scalable, production-ready fashion. They might be fine for testing or for small deployments (e.g. single node), but not ready to be used in large-scale installations. Having a driver included into Tooz would be an indication that operators can deploy using that DLM, which is something that the project wants to avoid.
In the end, the "production ready" criterion will be used to determine which drivers are allowed in, even though that term is somewhat amorphous. It was agreed that there would be a discussion with those who develop alternate DLM drivers as part of the acceptance process to determine whether the DLM is truly meant for large-scale deployments.
The meeting broke up with a solid conclusion and one that seems rather different than the sense of the room early on. As with other OpenStack components, the DLM piece will be handled with an abstraction layer that allows for multiple choices underneath. Like other OpenStack plugins and components, a candidate will need to pass all of the tests and have at least two maintainers to handle its care and feeding before it can be considered for inclusion. For Tooz drivers, though, the production-oriented question will need to be discussed as well.
All of that means that OpenStack sub-projects will be able to have a hard dependency on the presence of a DLM, which was, essentially, the goal set out in the Chronicles. Given the contentious nature of choosing a single one, however, it should perhaps not be a surprise that the project opted for the inclusive choice. That is very much in keeping with the OpenStack way and part of what has led to its success, as one participant noted.
[I would like to thank the OpenStack Foundation for travel assistance to
Tokyo for the summit.]
| Index entries for this article | |
|---|---|
| Conference | OpenStack Summit/2015 |
