PGCon 2014: Clustering and VODKA

June 4, 2014

This article was contributed by Josh Berkus

The eighth annual PostgreSQL developer conference, known as PGCon, concluded on May 24th in Ottawa, Canada. This event has stretched into five days of meetings, talks, and discussions for 230 members of the PostgreSQL core community, which consists both of contributors and database administrators. PGCon serves to focus the whole PostgreSQL development community on deciding what's going to be in next year's PostgreSQL release as well as on showing off new features that contributors have developed. This year's conference included meetings of the main PostgreSQL team as well as for the Postgres-XC team, a keynote by Dr. Richard Hipp, and new code to put VODKA in your database.

In many ways, this year's conference was about hammering out the details of many of the new ideas introduced at last year's conference, where Postgres-XC 1.0, the new JSON storage format, and the new data change streaming replication method were all introduced. While some of these features have code in PostgreSQL 9.4 beta, all of them need further work and development, and that's what people were in Ottawa to discuss. They were also there to discuss satellites, code forks, and SQLite.

ESA data and Postgres-XC

PGCon week started out with meetings of the developers most concerned with clustering and horizontal scalability for PostgreSQL: the Postgres-XC meeting and the clustering summit. Both events were sponsored by NTT, the Japanese telecom.

Postgres-XC is a fork of PostgreSQL that is intended to support high-consistency transactional clustering for write scalability in order to support workloads that need a very high volume of small writes of valuable data. Examples of this workload include stock trading, airline bookings, and cell phone routing; it is the same use-case that is filled by Oracle RAC. While this was the working meeting of the Postgres-XC developers, two things made it interesting this year: a presentation by Krzysztof Nienartowicz of the European Space Agency (ESA), and the announcement of the Postgres-XL fork of Postgres-XC.

Nienartowicz presented on the Gaia sky survey satellite project that is soon to be deployed by the ESA. It will stay in the sun-Earth L2 Lagrangian point and use a special mirror arrangement projecting into digital receptors in order to take broad survey images of stars and other celestial objects at a higher resolution than has ever been done before. The ESA's plan is to scan the whole sky over a period of five years, section by section, recording every visible object an average of 80 times in order to record motion as well as position, and to categorize and identify over one billion objects.

The satellite will download gigabytes of data per day from its 938 million pixel camera, eventually yielding several hundred terabytes. This presents the ESA with a tough data-management problem. Right now, ESA is using PostgreSQL for mapping, metadata, and categorization because the Gaia team likes PostgreSQL's analytic capabilities. In particular, the team has designed machine-learning algorithms that use Java, which is run both outside the database (connecting via OpenJPA) and inside it using PL/Java. These algorithms help scientists by doing some automated classification of objects based on distance, behavior, motion, and luminosity. PostgreSQL also allows them to collaborate with the many astronomical centers across Europe by distributing complex data as PostgreSQL database backups.

However, it will soon outgrow what's reasonably possible to manage with mainstream PostgreSQL. For that reason, the ESA is planning to build a large Postgres-XC cluster in order to be able to do analytics across a larger number of machines. Their team plans to contribute to the Postgres-XC project as well, so that project can meet ESA's needs for data scale.

Postgres-XC forked

The proposal which dominated the rest of the Postgres-XC meeting, however, was the presentation by Mason Sharp on his fork of Postgres-XC, called Postgres-XL. Previously, Sharp had used his clustering knowledge from the Stado project (formerly "ExtenDB", then "GridSQL") to create a proprietary fork of Postgres-XC called StormDB. That fork was the product of a startup launched in 2012, which operated for two years before being purchased by fellow PostgreSQL-based clustering startup Translattice. Sharp was then permitted by Translattice to open-source StormDB as "Postgres-XL" in April 2014.

Postgres-XL differs from Postgres-XC in several ways. First, it has a better logo. It is slightly more focused on the analytics use case than the transaction-processing use case, including pushing more work down to the individual nodes through changes in how query plans are optimized. More importantly, Sharp has been able to drop Postgres-XC's requirement to send all queries through a few "controller" nodes by allowing each cluster node to behave as its own controller. Postgres-XL has also added some "multi-tenant" features aimed at running it as a public service, so that user data is strictly segregated.

The biggest difference in the fork, though, is that Sharp chose to emphasize stability and eliminating bugs over adding new features. For the last couple years, Postgres-XC has been very focused on adding as many core PostgreSQL features as possible and constantly merging new source code from the upstream project. Postgres-XL, in contrast, is still based on PostgreSQL 9.2 (the previous version), and has disabled several features, such as triggers and auto-degrade of transaction control, which had been a source of reliability issues for Postgres-XC.

As good forks do, this provoked a lot of discussion and re-evaluation among the Postgres-XC developers and users. After much discussion, the Postgres-XC team decided that they would emphasize stability and eliminating bugs in the next release. It remains to be seen whether the two projects will merge, though. For one thing, Translattice chose to open-source Postgres-XL under the Mozilla Public License rather than the PostgreSQL License.

Developer meeting

Another PGCon event is the annual PostgreSQL Developer meeting, which is where the project hackers coordinate and discuss strategy and projects for the next year. Among the highlights from this meeting were:

Simon Riggs discussed his work with the European Union's (EU) AXLE Project. AXLE stands for "Analytics on eXtremely Large European data". This EU-funded project is focusing on "operational business intelligence", meaning analytics of current business and government data in the EU. The EU government has chosen to do this as a mostly open-source effort, and has selected PostgreSQL as the primary database to be used for the project.

Various AXLE projects will be contributing features and tools for PostgreSQL over the next two years. Among their goals are: security and encryption suitable for medical data, better performance and support for very large tables, GPU and FPGA integration for query execution, and analytic SQL functions.

The developers also discussed making several changes to the current CommitFest process, which the PostgreSQL project uses to manage new patches for the database. First, there will be a new, Django-based application for patch management; PostgreSQL has chosen to "roll its own" rather than using Gerrit in order to achieve tighter integration with the postgresql.org mailing list archives. Second, there will be five CommitFests over nine months in the upcoming year instead of four over seven months, making the development part of the year longer and shortening the overly long beta period. The committers hope that this will take some time pressure off of the development cycle, as well as giving contributors more time to respond to user feedback from the previous year's release.

Finally, Peter Geoghegan suggested that the project allow pull requests instead of requiring patches for new feature submission. The other developers raised various problems with this idea, the largest of which were issues around use of rebase, merge, and squash merge. None of them were thought to be satisfactory for the level of history which PostgreSQL wants to maintain. Merge retains too much extraneous activity from continuously merging from upstream as well as from minor bug-fix commits, while rebase and squash merge can eliminate all development history on large features, preventing committers from evaluating which alternate approaches the submitter already tried. For now, PostgreSQL retains a patch-based submission process.

The developers then had a long discussion about making PostgreSQL scale to more cores and more RAM. Some of the various obstacles to this were discussed. For example, there are currently two locks in PostgreSQL's dedicated memory, the "buffer free list" lock and the "buffer mapping lock", that are highly contended; ways to make them less of a bottleneck were proposed. In one of the most promising, Andres Freund proposed eliminating the lightweight lock used by read-only transactions on the buffer. The developers also plan to use perf for additional profiling of the "clock-sweep" code that frees memory buffers in PostgreSQL's private cache.

Another big way to improve this, Freund proposed, is to use atomic operations in the CPU rather than spinlocks where reasonable for various operations. Developers discussed how to handle older platforms which don't support atomic operations; whether it makes more sense to auto-degrade them to spinlocks, or whether to de-support those platforms (such as ARM6 and PA-RISC) entirely. The next step is to assemble a chart of which atomic operations are supported on which platforms.

Other topics were discussed at the meeting, such as data access auditing and eliminating requirements to log in as the superuser. There was a long discussion about how to avoid some of the bugs that appeared in PostgreSQL 9.3, which has had more critical patch updates than any release in a decade. The project will also be considering whether it is reasonable to emulate the Linux Foundation model of having a couple of committers paid by a PostgreSQL non-profit to do review and maintenance work on PostgreSQL full-time.

SQLite and PostgreSQL

This year's keynote was delivered by Dr. Richard Hipp, the inventor of SQLite, which is a widely-used embedded SQL database. SQLite was created in 2000, and today this open-source SQL database is part of over 500,000 applications and devices, including the iPhone, Firefox, Dropbox, Adobe Lightroom, and the Android TV.

It might seem strange to have the founder of a different database system give the keynote to a PostgreSQL conference, but Hipp was invited because of his well-known respect for PostgreSQL, and because many users consider SQLite to be "embedded Postgres". He explained how, when he created SQLite, the syntax and semantics were originally based on PostgreSQL 6.5. He chose PostgreSQL because unlike other SQL databases at the time, it always returned correct results and didn't crash. Even today, "WWPD" for "What Would Postgres Do" is a mantra of the SQLite development team.

Also, both database systems share a love of SQL and are quite complementary. While PostgreSQL is a scalable server database, SQLite is a replacement for data file storage for applications. Hipp called it "a replacement for fopen()". Instead of a "pile of files", the database offers a clean and consistent data storage interface which is more resistant to corruption and more versatile than application-specific XML and binary files. Hipp went on to suggest that several existing programs, such as OpenOffice and Git, would be significantly improved by using SQLite instead of their current file format. To demonstrate this, he created a web page that takes the PostgreSQL Git history, converts it to SQLite, and then offers it for searches and analytics that are not possible with the native Git files.

The big disagreement between PostgreSQL and SQLite relates to data types. While PostgreSQL has a complex and strictly enforced type system, SQLite uses an undefined type for all data, which can store strings, numbers and other values, much like variables in languages like Perl, Python, and PHP. This difference sparked some discussion between Hipp and a few members of the audience after the talk.

Hipp went on to explain how, despite recent trends, SQL would endure and replace current non-relational database approaches. He cited the evidence of Google's recent return to SQL with BigQuery and SQL interfaces for Hadoop, and quoted Fred Brooks, Rob Pike, and Linus Torvalds in support of the idea of formal data structures. He also called the current "NoSQL" databases "postmodern databases" because they embrace an "absence of objective truth".

Indexing with VODKA

Of course, while the PostgreSQL project may love SQL, recently it has been seeing JSON on the side. The project's team of Russian advanced indexing experts, Teodor Sigaev, Oleg Bartunov, and Alexander Korotkov, presented their latest innovations to the other developers. These new ideas, which include a new indexing data structure and a new query syntax, center around PostgreSQL's new binary JSON data type, JSONB.

First, however, they also presented some of their benchmarking work using the JSONB type and indexes that will be released with version 9.4. For these tests, they loaded 1.2 million bookmarks from the old Delicious database in JSON form into a PostgreSQL 9.4 database, and into a MongoDB 2.6.0 database to make comparisons. Search times for a single key between MongoDB and PostgreSQL were similar: one second vs. 0.7 seconds. However, it took 17 times as long to load the data into MongoDB, and the resulting database was 50% larger.

Bartunov and Sigaev had added "GIN" indexes to PostgreSQL in 2006. GIN stands for "Generalized Inverted iNdex", and is similar in structure and function to the indexes used for searching in Apache Lucene and Elastic Search. Their new index is designed for better searching of nested data structures, and is also based on a "to do" item from the original GIN submission message. They named the new indexing method "VODKA", which is a recursive acronym that stands for "VODKA Optimized Dendriform Keys Array". VODKA replaces some of the B-tree structures inside GIN indexes with a more generalized pointer arrangement based on SP-GiST, which is another index type they added to PostgreSQL 9.2.

Most importantly, of course, it allows PostgreSQL users to type: CREATE INDEX ... USING VODKA.

While they will be useful for certain kinds of spatial queries, the primary use of VODKA indexes is expected to be for searching JSONB data. To support this, they have also developed a new matching syntax and operators for JSON which they call "jsquery", a name which will probably need to change to avoid confusion. Jsquery combines with VODKA indexes to support fast index searches for keys and values deep inside nested JSONB values stored in PostgreSQL tables. While PostgreSQL 9.4 will allow searching for nested keys and values inside JSONB (a back-port of jsquery is available for 9.4), it is limited in how complex these expressions can be for fast index searches. VODKA removes these limitations.

This jsquery looks a lot like PostgreSQL's existing full text search syntax, which is unsurprising since it has the same inventors. For example:

    SELECT jsonb_col FROM table1 WHERE jsonb_col @@ 'a.b @> [1,2]';

That query asks "tell me if you have a key 'a' which contains a key 'b' which contains an array with at least the values (1,2)". It would be return true for '{"a": {"b": [1,2,3]}}', but false for '{"a": {"e": [2,5]}}' or '{"a": {"b": [1,3]}}'.

They concluded by discussing some of the roadblocks they are facing in current VODKA development, such as index cleanup and an inability to collect statistics on data distribution. This discussion continued at the unconference which took place on Saturday, at the end of PGCon. There was also some discussion about the proposed jsquery syntax, which some developers felt was too different from established JSON query technologies.

Other sessions and the unconference

Of course, there were many other sessions in addition to those mentioned above. There were several presentations about the PostgreSQL's new JSON features, including one by the pgRest team from Taiwan, who showed off a complete Firebase/MongoDB replacement using PostgreSQL and V8 JavaScript. Other talks covered improving test coverage for PostgreSQL, why it's taken so long to implement UPSERT, analyzing core dumps, using PostgreSQL in the Russian Parliament, and how to program the new streaming replication.

The conference then wound up with the second annual PostgreSQL Unconference, which allowed the contributors and users to discuss some of the many ideas and issues which had come up during the developer meeting and the main conference. Participants talked about data warehousing and the extension system, and a Hitachi staff member discussed the design of its PostgreSQL-based appliance. While half the participants in the unconference were code contributors to PostgreSQL, the other half weren't. These users were excited to have the chance to directly influence the course of development, as explained by Shaun Thomas.

The biggest focus of the day, similar to last year, was discussions about "pluggable storage" for PostgreSQL in order to support column stores, append-only storage, and other non-mainstream options. This topic was introduced by the staff of CitusData, based on limitations they encountered with Foreign Data Wrappers and their cstore_fdw extension, which they released earlier this year. Unfortunately, progress has been slow on pluggable storage due to the many difficult changes required to the code.

If last year's PGCon was revolutionary, introducing many of the new developments which would change how the database is used, this year's conference was all about turning those changes into production code. Certainly anyone whose job centers around PostgreSQL should try to attend PGCon. If you couldn't make it, though, slides and audio will be online soon at the PGCon web site.

Index entries for this article
GuestArticles	Berkus, Josh
Conference	PGCon/2014

PGCon 2014: Clustering and VODKA

Posted Jun 5, 2014 1:32 UTC (Thu) by kjp (guest, #39639) [Link] (5 responses)

The slides on UPSERT were great ("looking into the future", pushing the limits of mvcc, etc.). It's wild that you can do predicate locking / serializable transactions but no upsert yet [I'm guessing because people are more willing/ prepared to retry serializable failures].

Any contributions from Amazon? Getting picked up by RDS (I see it's still in beta) was a big win for you guys.

PGCon 2014: Clustering and VODKA

Posted Jun 5, 2014 1:35 UTC (Thu) by kjp (guest, #39639) [Link] (1 responses)

I can answer my own question: there was a talk about rds. http://www.pgcon.org/2014/schedule/events/640.en.html

PGCon 2014: Clustering and VODKA

Posted Jun 5, 2014 2:51 UTC (Thu) by petereisentraut (guest, #59453) [Link]

That presentation was not by Amazon, it was about Amazon.

PGCon 2014: Clustering and VODKA

Posted Jun 5, 2014 2:50 UTC (Thu) by petereisentraut (guest, #59453) [Link]

Getting PostgreSQL into RDS was great, but Amazon hasn't been contributing to PostgreSQL so far. They have sponsored other PostgreSQL conferences.

PGCon 2014: Clustering and VODKA

Posted Jun 5, 2014 18:28 UTC (Thu) by jberkus (guest, #55561) [Link]

kjp,

Well, that's the PostgreSQL project motto: "making hard things possible, saving the easy things for later." ;-)

PGCon 2014: Clustering and VODKA

Posted Jun 5, 2014 18:31 UTC (Thu) by jberkus (guest, #55561) [Link]

KJP,

More seriously, as you can see from the slides, making UPSERT work correctly turns out to be quite hard in a high-concurrency environment. MySQL was willing to impose serialization on their users, partly because they already had to support their original replication system. We can't, because our users wouldn't tolerate it.

UPSERT can be simulated client-side

Posted Jun 5, 2014 11:31 UTC (Thu) by dskoll (subscriber, #1630) [Link] (11 responses)

I don't really see a compelling use-case for UPSERT. You can always do the equivalent of:

BEGIN; UPDATE table SET col1=xxx ... WHERE condition...; if (rows_affected == 0) then INSERT INTO table (col1...) VALUES(...); COMMIT;

I guess the UPSERT statement would use only one DB round-trip so would be more efficient, but I don't see it as a huge deal.

UPSERT can be simulated client-side

Posted Jun 5, 2014 11:46 UTC (Thu) by andresfreund (subscriber, #69562) [Link] (10 responses)

Unfortunately you're wrong. That statement isn't safe under concurrency. Consider what happens if:
a) Another transaction inserts a row with a conflicting key inbetween your UPDATE and INSERT
b) If there already is a conflicting row in the table, but you can't see it yet because either the other transaction hasn't committed yet, or you're using REPEATABLE READ and the other transaction has committed after you.

UPSERT can be simulated client-side

Posted Jun 5, 2014 13:13 UTC (Thu) by dskoll (subscriber, #1630) [Link] (9 responses)

That statement isn't safe under concurrency

Yes, but neither can UPSERT be, can it? An UPSERT will either succeed completely or fail completely. Same with my transaction. The failure modes might be a little different, but you can always contrive situations in which there is a failure.

I guess what I'm asking is: Does the presumably-safer operation of UPSERT under concurrency actually matter in real life? I cannot think of a situation where it does, but maybe I just lack imagination. :)

UPSERT can be simulated client-side

Posted Jun 5, 2014 13:20 UTC (Thu) by andresfreund (subscriber, #69562) [Link] (7 responses)

>> That statement isn't safe under concurrency

> Yes, but neither can UPSERT be, can it? An UPSERT will either succeed completely or fail completely. Same with my transaction. The failure modes might be a little different, but you can always contrive situations in which there is a failure.

No. The upsert we're talking about *does* handle that correctly. We're still arguing about some implementation details but it's definitely possible to do better. The canonical example for that is:
http://www.postgresql.org/docs/current/static/plpgsql-con...

That solution has a relatively high overhead and is not really nice to use, but otherwise it works.

You can do the same from a client connection using savepoints (which internally are used by BEGIN .. EXCEPTION blocks).

> I guess what I'm asking is: Does the presumably-safer operation of UPSERT under concurrency actually matter in real life? I cannot think of a situation where it does, but maybe I just lack imagination. :)

Yes. All the time. If you use something like your code even under moderade concurrency you'll get failures.

UPSERT can be simulated client-side

Posted Jun 6, 2014 18:45 UTC (Fri) by zlynx (guest, #2285) [Link] (1 responses)

> Yes. All the time. If you use something like your code even under moderade concurrency you'll get failures.

Is that really a problem? I've written a lot of Perl scripts that use PostgreSQL with very similar transactions, and if it fails I have just done a retry. It has always seemed to work well enough.

Yes, I had to add the retry later after realizing it was a potential problem, but since I check all statements for success and fail or log on errors as all programmers should do, I've never had a problem with thinking it succeeded when it didn't.

UPSERT can be simulated client-side

Posted Jun 7, 2014 9:48 UTC (Sat) by andresfreund (subscriber, #69562) [Link]

> > Yes. All the time. If you use something like your code even under moderade concurrency you'll get failures.

> Is that really a problem? I've written a lot of Perl scripts that use PostgreSQL with very similar transactions, and if it fails I have just done a retry. It has always seemed to work well enough.

There's cases where that's enough. There are others where it's not. Consider the case where the work done in the transaction isn't cheap - retrying the whole thing because of not having an UPSERTish operation is annoying; another case is operations which should have relatively low latency; web frameworks where it's not always easy to retry the whole transaction; ...

You clearly can survive without it, but it does make life easier.

UPSERT can be simulated client-side

Posted Jun 12, 2014 2:41 UTC (Thu) by sbishop (guest, #33061) [Link] (4 responses)

Would you mind helping me understand why dskoll's code wouldn't work? I know why it would be a problem on SQL Server, which is where all my experience is at, but my understanding of PostgreSQL's approach to MVCC tells me that the transaction ought to see a consistent view of the database within one transaction. (In other words, I don't know why the linked-to code loops.)

UPSERT can be simulated client-side

Posted Jun 12, 2014 6:10 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

Just imagine what happens if two of these transactions are executing at the same time. You'll end up with two copies of data, since each transaction will see the same view initially.

In practice, you'll likely get a constraint violation error in one of the transactions. That's actually the way UPSERT is usually implemented in Postgres right now.

Ironically, MSSQL does not have this problem because it locks tables, so a second transaction will be waiting until the first transaction is finished.

UPSERT can be simulated client-side

Posted Jun 13, 2014 2:36 UTC (Fri) by sbishop (guest, #33061) [Link]

Thank you, Cyberax. That makes sense.

UPSERT can be simulated client-side

Posted Jun 13, 2014 8:09 UTC (Fri) by intgr (subscriber, #39733) [Link] (1 responses)

> Ironically, MSSQL does not have this problem because it locks tables, so a second transaction will be waiting until the first transaction is finished

Citation needed. This article contradicts your claim: http://www.mssqltips.com/sqlservertip/3074/use-caution-wi...

> MERGE *looks* like it will take care of concurrency for you, because implicitly it seems to a single, atomic statement. However, under the covers, SQL Server really does perform the different operations independently. This means that you could end up with race conditions or primary key conflicts when multiple users are attempting to fire the same MERGE statement concurrently

MSSQL and MVCC

Posted Jun 14, 2014 13:59 UTC (Sat) by oldtomas (guest, #72579) [Link]

According to the wikipedia page on MVCC [1] MS SQL supports MVCC since SQL Server 2005. So they were late at the party (as often) but they seem to be there.

OTOH, given the installations I've seen in the wild (not the real showcases, mind you), most DBAs and application writers seem overly conservative and prefer to not know the feature exists.

UPSERT can be simulated client-side

Posted Jun 5, 2014 13:20 UTC (Thu) by dskoll (subscriber, #1630) [Link]

Yeah, ok, never mind. :) I figured it out.