|
|
Log in / Subscribe / Register

A distributed lock manager for OpenStack?

By Jake Edge
October 28, 2015

OpenStack Summit

On the first day of the Tokyo OpenStack Summit, there was a potentially contentious topic discussed in the Design Summit: should OpenStack adopt a single distributed lock manager and, if so, which should it be? The cross-project session was broken up into two parts, the first of which targeted the first question; the second would then look to the implications of that decision. The discussion and decision provided an interesting look into some of the inner workings of the project.

Hot on the heels of the October 15 release of OpenStack Liberty, the developers gathered in Tokyo October 27–30 to determine what would be in the next release, Mitaka, which is due in April 2016. But the summit is also an opportunity to look at longer-term changes that will come in releases over the next year or two. Mike Perez, who is the cross-project developer coordinator at the OpenStack Foundation, moderated the two sessions that, apparently, were not quite as contentious as perhaps was feared.

The overall problem has been summarized in a document: "Chronicles of a distributed lock manager". There is a need for various OpenStack components to perform some operations atomically, which generally means some kind of locking solution is required. Because OpenStack is a distributed system, though, a distributed lock manager (DLM) is needed. Currently, each sub-project has dealt with the problem on its own, typically by storing a lock in its database.

The proliferation of these ad hoc solutions is becoming a problem for the overall project. In addition, there are other sub-projects that would like to have some kind of locking, but would rather not create their own. That led to the idea of choosing a DLM to ship with OpenStack that sub-projects could rely upon being present. That immediately leads to a second question: which?

There are various options for a DLM that are laid out in the Chronicles document. As might be guessed, each has its strengths and weaknesses. The discussion mostly focused on three: Apache ZooKeeper, etcd, and Consul. Each brings additional features that will be of use to some sub-projects, such as leader election and service discovery.

There was some discussion of various sub-projects and their requirements, such as for the Cinder block storage component, the Ironic bare-metal provisioning handler, and the Heat orchestration system. There were obvious parallels between each project's needs, with many needing service discovery and leader election as well as shared locks. The Chronicles document looks at even more of the sub-projects; there were a few more added to the Etherpad notes from the sessions.

One of the main questions is whether operators of OpenStack clouds would "vomit" if they were required to install a specific DLM. An informal straw poll of those in the room found that each of the major options had some opposition. While ZooKeeper has the most features, there were a number of concerns around it, largely because of its implementation language: Java. There are operators who do not want to add the Java Virtual Machine (JVM) into their operations, so the decision comes down to a "Java vs. non-Java" question (both etcd and Consul are written in Go).

But fair locks (ones that prevent starvation) can only be implemented with ZooKeeper, so there was a question about whether that feature was needed. So far, at least, there are no sub-projects that require fair locking, but it certainly seems like something that may be needed down the road. Restricting the project to a solution that cannot provide fair locking struck some as short-sighted. Others noted that there would be a chance to re-address the question in six months, since only one or two projects (likely Cinder and, possibly, Ironic) would have switched to anything new.

There was a suggestion that instead of choosing one DLM, the project could adopt an abstraction layer, perhaps one based on the optional OpenStack Tooz library. That would allow those who wanted a different DLM to run it with a driver to present the common API. There was a mixed reaction to that idea as some clearly felt that an opinionated choice should be made. OpenStack Foundation Director of Engineering Thierry Carrez said that if one DLM was picked, the overall sense of the room seemed to be for ZooKeeper.

But running ZooKeeper on the JVM from the OpenJDK project was of concern to some. Most run ZooKeeper with the Oracle JVM, so there may be problems that occur with OpenJVM—problems that might not be addressed quickly by the ZooKeeper upstream. Running the Oracle JVM is a non-starter for some operators, however. In addition, ZooKeeper isn't really a DLM, but is a toolkit for building a DLM, one attendee noted, which may make it hard for others to replicate the DLM that was built and tested by OpenStack.

On the other hand, though, maintaining an abstraction layer for each DLM choice would be a burden on the project. In addition, there are going to be quirks for each one and it would better to design around the quirks of one, rather than three (or more). But others noted that OpenStack would likely only build one (for ZooKeeper) and that others would need to fill in the abstraction layer for DLMs of interest to them.

There is an established pattern in OpenStack of having abstraction layers and being inclusive, one attendee said. But there are major advantages to having at least one DLM available, rather than having zero as it is today. So it makes sense to focus on having at least one DLM available.

Carrez said that he had come into the session thinking that a choice for a single DLM should be made but, at the end, he was convinced that an abstraction layer was the right approach. That seemed to be agreeable to most in the room (who represented multiple sub-projects and project constituencies). It was also agreed that the default would be ZooKeeper.

After a short break, with some participants having to head off to other sessions, the implications of the decision to have an abstraction layer were discussed. First off, there were some thoughts presented about how components like Ironic could be upgraded in place from their existing database locks to something DLM-based, with minimal downtime. The basic problem is in how to migrate an existing lock from the database to the new scheme without losing track of it during the upgrade phase. Ironic developers seemed confident they had an approach that would work.

Using Tooz as the DLM abstraction layer seemed the obvious approach, but there are some problems with the existing drivers for Tooz. For example, the database driver can't actually provide what the Tooz API promises, so it needs to be removed. A SQL database cannot handle some of the DLM failure modes, so it would look like it was providing DLM functionality, when it actually cannot. Similarly, the interprocess communication (IPC) driver may not be able to faithfully implement the API.

There is a question of how to decide which drivers will be accepted into Tooz. The concern is that some DLMs might have drivers written, but that the underlying DLM cannot truly fulfill the requirements in a scalable, production-ready fashion. They might be fine for testing or for small deployments (e.g. single node), but not ready to be used in large-scale installations. Having a driver included into Tooz would be an indication that operators can deploy using that DLM, which is something that the project wants to avoid.

In the end, the "production ready" criterion will be used to determine which drivers are allowed in, even though that term is somewhat amorphous. It was agreed that there would be a discussion with those who develop alternate DLM drivers as part of the acceptance process to determine whether the DLM is truly meant for large-scale deployments.

The meeting broke up with a solid conclusion and one that seems rather different than the sense of the room early on. As with other OpenStack components, the DLM piece will be handled with an abstraction layer that allows for multiple choices underneath. Like other OpenStack plugins and components, a candidate will need to pass all of the tests and have at least two maintainers to handle its care and feeding before it can be considered for inclusion. For Tooz drivers, though, the production-oriented question will need to be discussed as well.

All of that means that OpenStack sub-projects will be able to have a hard dependency on the presence of a DLM, which was, essentially, the goal set out in the Chronicles. Given the contentious nature of choosing a single one, however, it should perhaps not be a surprise that the project opted for the inclusive choice. That is very much in keeping with the OpenStack way and part of what has led to its success, as one participant noted.

[I would like to thank the OpenStack Foundation for travel assistance to Tokyo for the summit.]

Index entries for this article
ConferenceOpenStack Summit/2015


to post comments

A distributed lock manager for OpenStack?

Posted Oct 29, 2015 12:45 UTC (Thu) by isotopp (subscriber, #99763) [Link] (5 responses)

Out of all the 8 backends that Tooz supports, only one choice (Fully Distributed Zookeeper) actually can keep the promises that Tooz makes, assuming the Tooz layer does not introduce additional problems. One other choice (Non-Distributed Zookeeper) is a valid choice for testing being able to give realistic single-host test results. The remaining six choices for backends (file, ipc, kazoo, memcache, mysql, postgres and redis) are all broken as DLM choices and can never fulfill the functional promises of Tooz. They have no reason to exist.

A distributed lock manager for OpenStack?

Posted Oct 31, 2015 20:21 UTC (Sat) by harlowja (guest, #97846) [Link] (4 responses)

As one of the authors of tooz. I agree, some should probably be deprecated. Redis and kazoo should likely be kept (kazoo == zookeeper, redis-py == redis), the odd one there is redis. It has been gaining properties that make it more of what I call the 'poor mans zookeeper', although it has become more of a swiss army knife than a surgical tool (just look at http://redis.io/commands/)...

A distributed lock manager for OpenStack?

Posted Nov 1, 2015 19:50 UTC (Sun) by isotopp (subscriber, #99763) [Link] (3 responses)

You want to keep stuff? Run Jepsen on it. If it fails, in distributed mode, it should probably die. Redis does not survive Jepsen testing (http://aphyr.com/tags/jepsen)

A distributed lock manager for OpenStack?

Posted Nov 2, 2015 18:43 UTC (Mon) by harlowja (guest, #97846) [Link] (2 responses)

Understood, but then what really has survived jepsen (besides zookeeper)?

I'm more along the line of listing the various capabilities (strengths and weaknesses) of the underlying implementations and letting the smart user select one that is appropriate, jepsen testing is like ensuring you have a battleship, but sometimes you only need a powerboat and/or canoe (for lack of better analogy)...

A distributed lock manager for OpenStack?

Posted Nov 2, 2015 21:36 UTC (Mon) by isotopp (subscriber, #99763) [Link] (1 responses)

ZK, etcd and consul have (consul on 2nd try). And that's actually quite the point here - these things are built to manage a cluster properly by building the inner cluster (ZK calls it the "ensemble") and once that is stable, to provide a platform and service for the outer cluster.

Services provided usually include at least discovery (things such as hypervisors, running VMs, and *-api instances could check in with the ensemble and leave their endpoint data with it, making things such as keystone.endpoints and similar tables superfluous and much more accurate for degraded states. Also, HA for free). Other services that are common are locking (the original reason for the project), and with locking global order/sequence numbers/counters. Also, a kind of highly available, distributed, small key/value store, in consuls case with config file generator attached.

In the end, rigorous application of the tool to the entire openstack project would greatly benefit the project in handling degraded cases, HA, locking, and turning from a web system into a proper control system/state engine. That is, it would become much more reliable. For this the openstack project would have had to commit itself to a single of these system in order to be able to reliably use the feature set of that system.

With an abstraction layer inbetween you will only get what the weakest of these systems can give you (hence the Redis needs to die), and you would need to Jepsen the shit of out every possible configuration and service provided in order to make sure the abstraction on top of the underlying consensus system doesn't break stuff.

A distributed lock manager for OpenStack?

Posted Nov 2, 2015 23:49 UTC (Mon) by harlowja (guest, #97846) [Link]

Totally agree with the one solution; it was what I (and others) was thinking was going to be an outcome of that session (which I was the other driver of) but it didn't turn out this way in the end (even though I tried to say we should really be opinionated). As far as the other drivers, yes, the point is well taken, and jepsen is a good base case and it might be the way we should handle new drivers (if they have not been proven jepsen safe, then they aren't allowed); redis would then get kicked out due to this (although for small enough deployments perhaps having a single redis that uses persistent storage and can *manually* failover to a secondary master will be fine). Overall we (as a tooz group) are figuring out the way to go here and there seems to be a few ideas (like this) that are being bounced around.

A distributed lock manager for OpenStack?

Posted Oct 29, 2015 19:44 UTC (Thu) by alfille (subscriber, #1631) [Link] (1 responses)

This article seems thorough, but would benefit from presenting a little background.

A distributed lock manager for OpenStack?

Posted Nov 2, 2015 19:49 UTC (Mon) by harlowja (guest, #97846) [Link]

The document @ https://review.openstack.org/#/c/209661/ should hopefully help here (created by yours truely).

A distributed lock manager for OpenStack?

Posted Oct 30, 2015 12:02 UTC (Fri) by skitching (guest, #36856) [Link]

If the features of ZooKeeper match the requirements, but the required runtime is a problem, then why not

(a) precompile the bytecode to native code (eg via gcj), or
(b) start a project to port ZooKeeper to another language (eg rust) which compiles to native code?

ZooKeeper is a reasonably stable project now, and isn't huge. A direct 1:1 reimplementation seems feasable..


Copyright © 2015, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds