LWN.net Weekly Edition for July 9, 2015
A better story for multi-core Python
Running the standard Python interpreter in threads on multiple CPU cores has always resulted in a smaller performance gain than one might naively think—or hope for. Because of the CPython global interpreter lock (GIL), only one thread of execution can be running in the interpreter core at any given time. Removing the GIL has long been a topic of discussion in Python circles, and various alternative Python implementations have either removed or worked around the GIL. A recent discussion on the python-ideas mailing list looked at a different approach to providing a better multi-core story for Python.
In a post that was optimistically titled
"solving multi-core Python
", Eric Snow outlined an approach that did not
rely on removing the GIL, but instead relies on "subinterpreters" and a
mechanism to
share objects between them. The multi-core problem is partly
a public
relations problem for the language, Snow said, but it needs solving for
that and other, more technical reasons.
Subinterpreters
The basic principle behind Snow's proposal is to take the existing subinterpreter support and to expose it in the Python language as a concurrency mechanism. The subinterpreters would run in separate threads, but would not generally share data with each other, at least implicitly, unlike the typical use of threads. Data would only be exchanged explicitly via channels (similar to those in the Go language). One of the main influences for Snow's thoughts (and for Go's concurrency model) is Tony Hoare's "Communicating Sequential Processes".
Handling objects shared between subinterpreters is one of the areas that requires more thought, Snow said. One way forward might be to only allow immutable objects to be shared between the subinterpreters. In order to do that, though, it probably makes sense to move the reference counts (used for garbage collection) out of the objects themselves and into a separate table. That would allow the objects themselves to be truly unchanging, which could also help performance in the multi-processing (i.e. fork()) case by avoiding page copies (via copy-on-write) of objects that are simply being referenced again, as Nick Coghlan pointed out.
Other areas that need to be considered are what the restrictions on subinterpreters would be. If, for example, subinterpreters were not allowed to start new threads, they would be single-threaded and not require a GIL. Or the GIL for subinterpreters could be replaced with a "local interpreter lock", with the main GIL used in the main interpreter and to mediate interaction between subinterpreters. There is also a question about using fork() in subinterpreters. In the initial email, Snow suggested disallowing that, but in the discussion that followed, he seemed to rethink that.
The proposal is clearly a kind of early stage "request for comment" (or
"a shot over the bow
" as Snow put it) but it did spark quite a
bit of discussion and some fairly favorable comments. Yury Selivanov was
quite interested in the idea, for example,
noting that just being able to share immutable objects would be useful:
Concerns
But Gregory Smith was concerned about the
impact of each subinterpreter needing to re-import all of the modules used
by the main interpreter, since those would not be shared. That would
reduce the effectiveness of Snow's model. On the other
hand, though, Smith sees a potential upside as well: "I think a
result of it
could be to make our subinterpreter support better which would be a good
thing.
" Several suggestions were made for ways to speed up the
startup time for subinterpreters or to share more state (such as modules)
between the
interpreters.
Several in the thread believed that the existing, fork()-based concurrency was the right way forward, at least for POSIX systems. For example, Devin Jeanpierre said:
While fork() does provide those benefits, it is only available on
POSIX systems. It is different than Snow's goal, which is "to make it obvious and undeniable that
Python (3.6+) has a good multi-core story
", which is partly a matter
of public perception. The subinterpreter idea
is just a means to that end, he said, and he would be happy to see a different
solution if it fulfilled that goal. In the meantime, though, his proposal
has some characteristics that multi-processing with fork() lacks:
But Sturla Molden pointed to the lack of
fork() for Windows as one of the real reasons behind Snow's proposal: "It then boils down to a workaround for the fact that
Windows cannot fork, which makes it particularly bad for running
CPython
". But, as Snow said, Python
cannot ignore Windows. Beyond that, though, even with the "superior"
fork() solution available, the perception of multi-core Python is
much different:
Molden replied with a long list of
answers to the "FUD" that is promulgated about Python and the GIL, but that
doesn't really change anything. That is why Snow's goal is to make
multi-core support "obvious and undeniable
". It also seems
that Molden is coming from a scientific/numeric Python background, which is
not generally where the complaints about Python's multi-core support
originate, as Coghlan noted.
Shared data
The reasoning behind restricting the data shared between interpreters to
immutable types (at
least initially) can
be seen from a question asked by Nathaniel
Smith. He wondered how two subinterpreters could share a complicated data
structure
containing several different types of Python objects.
Snow acknowledged that concern, and
suggested that avoiding the "trickiness involved
" in handling
that kind of data by sticking to immutable objects; though there may be
"some sort of
support for mutable objects
" added later, he said.
Coghlan summarized Snow's proposal as really being three separate things:
- Filing off enough of the rough edges of the subinterpreter support that we're comfortable giving them a public Python level API that other interpreter implementations can reasonably support
- Providing the primitives needed for safe and efficient message passing between subinterpreters
- Allowing subinterpreters to truly execute in parallel on multicore machines
All 3 of those are useful enhancements in their own right, which offers the prospect of being able to make incremental progress towards the ultimate goal of native Python level support for distributing across multiple cores within a single process.
In addition, Coghlan has published a summary of the state of multi-core Python that looks at the problem along with alternatives and possible solutions. It is an update from an earlier entry in his Python 3 Q&A and is well worth a read to get the background on the issues.
There seems to be enough interest in Snow's proposal that it could be on
the radar for Python 3.6 (which is roughly 18 months off). There is a
long road before
that happens, though. A PEP will have to be written—as will a good bit of
code. We also have yet to see what Guido van Rossum's thoughts on
the whole idea are, though Snow did mention some discussions with Python's
benevolent dictator for life in his initial post. As Nathaniel Smith put
it, Snow's approach seems like the "least impossible
" one.
That is not the same as "possible", of course, but seems hopeful at least.
A preview of PostgreSQL 9.5
The PostgreSQL developers voted at the 2015 developer meeting to release an alpha version of PostgreSQL 9.5 as soon as possible. As it turns out, that meant July 2nd, so the 9.5 alpha is available now. While this is only a preview release of 9.5, it's full of cool features for database geeks, including "UPSERT", more JSONB utilities, row-level security, and better multicore scaling. However, long-time PostgreSQL users will notice that 9.5 is a bit behind schedule at this point.
Why an alpha?
One question users have is why the PostgreSQL project isn't issuing a beta release at this point, as it normally would in June or July. The answer to that has to do with the reliability bugs the database has had over the last six months. Due to these problems, the PostgreSQL committers have become very cautious about new features that might cause unexpected reliability or security issues, and want to reserve the right to modify or cancel features before the final release. Historically, the project has tried to freeze all APIs by the first beta, so the alpha label shows that they aren't doing so.
The primary reliability issues for the database centered around special transaction counters called "multixacts" that help the database keep track of data visibility when multiple concurrent sessions touch the same rows. Changes made to the multixact mechanism in PostgreSQL 9.3 in order to make foreign keys more efficient had a number of unexpected side effects, including some data-loss bugs that led to multiple update releases within a relatively short period. One of the things that may delay PostgreSQL 9.5 is that developers are still working on the last known issues with multixacts, and plan to fix them before the final release.
One of PostgreSQL's most competitive features is its reputation for ensuring zero data loss. As such, the developers also want to make certain that none of the new features that make changes to data storage or the database transaction log will cause data loss as a side effect. This means more pre-release testing than was done for prior releases. One such at-risk feature is automated transaction log compression to reduce I/O; the other is UPSERT.
UPSERT
One feature which application developers switching from MySQL to PostgreSQL have missed is "INSERT ON CONFLICT UPDATE", otherwise known as UPSERT. This SQL feature allows developers to not worry about whether the row they are adding is new or not, simplifying application programming. It also can eliminate the need for multiple round-trips to the database to check to see if another user has concurrently added the same information.
PostgreSQL finally has UPSERT, thanks to Heroku engineer Peter Geoghegan. The version of the syntax in the alpha looks like this:
INSERT INTO users (user_id, email, login) VALUES ( 1447, 'josh@postgresql.org', 'jberkus' ) ON CONFLICT (user_id) DO UPDATE SET email = EXCLUDED.email, login = EXCLUDED.login;
What the above says is: insert this user if it is not already present, but if the user ID is already in the table, then update the email and login fields instead. The special EXCLUDED keyword says "do this only with rows that conflicted", and allows UPSERT to work with batch imports of data as well as single rows.
For other situations, you can also choose to DO NOTHING. For example, imagine that you're collecting "likes" for a social web application; if a user tries to insert the same "like" twice, you simply want to ignore it, like so:
INSERT INTO likes ( user_id, video_id ) VALUES ( 1447, 20135 ) ON CONFLICT DO NOTHING;
Adding UPSERT to PostgreSQL has taken Geoghegan more than two years of work, including a lot of "back to the drawing board" redesigns. One of the reasons for the long development period, as well as the long wait for a feature that has been in MySQL for years, is the project's need to accommodate power users. In addition the above examples, the new UPSERT supports designating specific table constraints as the conflicts, and creating arbitrarily complex rules for handing the rejected data, including nesting conflict checks through subqueries and WITH clauses. The new feature was also required to work seamlessly with replication and the new row-level security. Also, since UPSERT is not part of the SQL standard, the project spent a lot of time arguing about desired syntax for the feature.
More than anything, though, the difficult part of developing the feature was making it work correctly in high-concurrency environments. UPSERT needed to yield the correct result and not corrupt data even if 50 users were trying to UPSERT the same row at the same time. This need for "bulletproof concurrency" has been the biggest thing delaying the feature, as well as the biggest reason for concern by committers about it causing future unanticipated issues. Regardless, the long wait has resulted in a much more powerful UPSERT feature than Geoghegan originally specified, so it was probably worth the wait.
PostgreSQL as a document database
While it is adding new SQL features, the project also seems to be hard at work re-implementing itself as a "NoSQL" database competitor. While the project has some grand plans for document database support in future years, version 9.5 includes a bunch of new built-in functions and operators to make PostgreSQL a better JSON document database right away. Various developers have also been creating external tools to make it more non-relational application friendly. JSON is a standard serialization format for object data, and JSONB is PostgreSQL's binary storage format and data type for that type of data.
The central new built-in function is jsonb_set(), a function that allows users to update any arbitrary key within a nested JSONB document, for example:
SELECT profile FROM profiles WHERE user_id = 1447; '{ "type" : "i", "clubs" : { "chess club" { "role" : "member" } } }' UPDATE profiles SET profile = jsonb_set(profile, ARRAY['clubs','chess club'], '{ "role" : "chair" }', TRUE ) WHERE user_id = 1447; SELECT profile FROM profiles WHERE user_id = 1447; '{ "type" : "i", "clubs" : { "chess club" : { "role" : "chair" } } }'
The jsonb_set() statement above would add the '"chess club" : { "role" : "chair" }' document to the user's list of clubs nested inside their profile (which is a JSONB column in the table), or update their chess club membership to "chair" if they were already a member. Since it allows users to modify nested keys "in place" without parsing the entire JSONB document in the application, or installing the PostgreSQL PL/v8 extension to run JavaScript inside the database, this feature allows users to run much more meaningful document database workloads on PostgreSQL. In addition to jsonb_set(), 9.5 includes new functions and operators that support JSONB concatenation, key deletion, and aggregating data in tables into complex JSONB objects.
While the built-in functions allow users to do a lot, they don't support data searches of arbitrary complexity for applications involving large populations of JSONB documents. JsQuery, released this year by PostgreSQL's contributor team from Moscow, adds a special JSONB search language to PostgreSQL, and new indexes to support it. This new language supports wildcards, range searches, and boolean logic. For example, if you wanted to search for the chair of the chess club, you could run this JsQuery:
SELECT user_id FROM profiles WHERE profile @@ 'clubs."chess club".role = chair';
PostgreSQL 9.5 with JsQuery can therefore be used by developers who want to abandon the relational model entirely and just store a collection of documents in the database. Several projects have been created over the last couple of years to take advantage of this and wrap PostgreSQL in a NoSQL API, both to ease migration from MongoDB and other databases, and to allow the creation of "hybrid database applications" with both non-SQL and SQL-based access. One of the most recent of these projects is BedquiltDB by Shane Kilkelly, which supports users who want to use the MongoDB syntax to modify and search data in PostgreSQL.
For users who prefer a fully relational database while supporting a document-oriented API, ToroDB, by Spanish consulting firm 8KData, is also new this year. ToroDB accepts data requests and updates using the MongoDB protocol. Data is automatically decomposed into relational tables and transformed into JSON documents for client requests. At the Big Data Spain conference, developer Álvaro Hernández Tortosa claimed that this kind of hybrid database is more flexible and scales better for very large data sets than pure non-relational approaches.
Regardless of which tools end up being the most successful, it seem that the PostgreSQL community plans to take on a lot of current and future document database workloads. The next couple of years of competition with non-relational databases should be interesting.
Row-level security
For the last three major PostgreSQL releases, the project has been adding features to allow increasingly specific data security controls. This has included column level permissions, security "labels", integration with SELinux, and in 9.5, row-level security (RLS). What RLS does is allow administrators specify rule-based permissions required for each individual row in a table, and is also known by names like "virtual private database" and "fine-grained security". RLS has been in demand by users with strong security needs around their data, such as credit card processors and healthcare companies.
RLS is disabled by default on PostgreSQL tables. However, it's easy to enable and the syntax is straightforward. For example, say you wanted to allow users to read their own profiles, but not other people's. You could take advantage of the special database variable current_user and check that the current database user matched the login column of the table profiles, like so:
ALTER TABLE profiles ENABLE ROW LEVEL SECURITY; CREATE POLICY read_own_data ON profiles FOR SELECT USING (current_user = login);
Much more sophisticated policies are possible, including arbitrary constraints and setting specific policies for specific database login roles. RLS can also be combined with column permissions to effectively give "cell-level" permissions control.
For the past three years, a lot of the work to bring about RLS has been driven by NEC engineer KaiGai Kohei. In the 9.5 development cycle, that work was taken up by Dean Rasheed and Stephen Frost. While Frost's involvement in security features for PostgreSQL is longstanding, another reason for his interest in RLS became apparent on May 22. On that day, the US National Reconnaissance Office (NRO) announced that it was rolling out a relational, geospatial database solution that would support "multilevel security" — the first of its kind. The partnership to deliver this database consists of Lockheed Martin, Red Hat, Seagate, and Frost's employer, Crunchy Data Solutions.
Multilevel security (MLS) is a design for data control that centers around the idea that different personnel should be able to see different data based on their clearance level. Lower-level staff should not even be aware that high-level data exists, and in some cases should be given misleading data in its place. MLS is popular with intelligence agencies, some of whom have been looking to add it to PostgreSQL as far back as 2006. The NRO, which manages US spy satellites, is an obvious user of such a system.
According to various press releases, Red Hat SELinux security policies combined with PostgreSQL RLS delivers effective MLS for the agency's Centralized Supercomputing Facility. Exact details on the implementation are not yet available, but the NRO seems to be prepared to put all of the code for it into GitHub projects. This seems to be part of a trend in the US Department of Defense to release various components as open source, showing that at least one part of the US government believes that open is also more secure.
Even if you don't work for an intelligence agency, though, there are uses for RLS for securing more mundane data like password tables.
Multicore scalability
An area that the PostgreSQL project works on constantly is multicore scalability. While developers are working on scale-out to multiple servers, users also want to run PostgreSQL on bigger and bigger machines. Version 9.5 will bring substantial improvements in read-request throughput on high-core-count machines, such as the IBM POWER-8 machines on which PostgreSQL 9.5 was tested. As these machines offer 24 cores and 192 hardware threads, they make a good target for multicore scalability. IBM's Stewart Smith has been using the same kind of system to push MySQL up to one million queries per second.
PostgreSQL 9.4 peaked at around 32 concurrent requests, with overall throughput dropping beyond that even if there were idle cores available. According to EnterpriseDB engineer Amit Kapila, multicore scalability is a matter of improving lock handling: eliminating as many locks as possible, and reducing the cost of the others. To this end, Andres Freund rewrote PostgreSQL's "lightweight locks" (LWLocks) mechanism to use atomic operations on processors where they are supported, instead of spinlocks. This reduced CPU contention caused by waiting for locks as well as speeding up the process of acquiring a lock.
To further improve throughput, Robert Haas reduced the amount of time that the database holds locks in order to evict buffers from memory, and increased the number of mapping partitions for buffers from 64 to 128. That work, combined with the LWLock improvement, means that PostgreSQL 9.5 now scales smoothly to 64 concurrent requests and up to double the throughput on read-only workloads that 9.4 did — increasing from 300,000 transactions per second to over 500,000 in Kapila's tests [PDF]. Note that Kapila is using a different benchmark than Smith is, so the PostgreSQL and MySQL numbers are not directly comparable.
In version 9.5, the developers have also decreased memory requirements per backend and added transaction log compression to improve memory and I/O performance. Work in PostgreSQL 9.6 is now focusing on other areas of less-than-optimal performance on large servers, such as those with large amounts of memory. Memory management on servers with over 256GB of RAM is inefficient, sometimes causing large amounts of RAM to have little or no benefit for users. Ideas to fix this are under discussion.
Conclusion
There are, of course, more features than the above. The foreign data wrappers facility now supports importing remote database schemas, partitioning across multiple foreign tables, and using index scans on the remote database. The SKIP LOCKED query qualifier makes PostgreSQL a better database for storing application queues. Replication and failover has become more reliable with the pg_rewind tool and other changes.
Version 9.5 also includes a bundle of features targeting "big data" use cases. These include: Block Range Indexes (BRIN) for indexing very large tables, faster data sorting, and data analytics features GROUPING SETS and CUBE. These will be covered in an upcoming article when 9.5 beta 1 is released.
The PostgreSQL project expects to release a beta every month starting in August, until 9.5 is ready for release sometime in late 2015. Historically, the project has released new versions in September, but due to falling behind schedule, mid-to-late October is considered more likely at this point. In the meantime, the alpha release is available for download, including as a Docker container. The PostgreSQL project would like you to try it out for yourself, see if the features are what they're promoted to be, and report a few bugs while you're at it.
Page editor: Jonathan Corbet
Inside this week's LWN.net Weekly Edition
- Security: OpenOffice and CVE-2015-1774; New vulnerabilities in ansible, firefox, kernel, php, ...
- Kernel: 4.2 Merge window part 3 (and some kernel stats); Deferrable locking; Restartable sequences.
- Distributions: The value of specs; CoreOS, Debian, ...
- Development: Self-hosting projects with Gogs; Firefox, ownCloud, sendmail, ...
- Announcements: Microsoft Now OpenBSD Foundation Gold Contributor, PyCon cfp, ...