User: Password:
|
|
Subscribe / Log in / New account

A report from pgCon 2013

Please consider subscribing to LWN

Subscriptions are the lifeblood of LWN.net. If you appreciate this content and would like to see more of it, your subscription will help to ensure that LWN continues to thrive. Please visit this page to join up and keep LWN on the net.

June 10, 2013

This article was contributed by Josh Berkus

This year's pgCon, which concluded May 25th, included an unusually high number of changes to the PostgreSQL community, codebase, and development. Contributors introduced multiple new major projects which will substantially change how people use PostgreSQL, including parallel query, a new binary document store type, and pluggable storage. In addition, Tom Lane switched jobs, four new committers were selected, pgCon had the highest attendance ever at 256 registrations, and held its first unconference after the regular conference. Overall, it was a mind-bending and exhausting week.

pgCon is a PostgreSQL developer and advanced user conference held in Ottawa, Canada every year in May. It usually brings together most of the committers and major contributors to the project in order to share ideas, present projects and new features, and coordinate schedules and code changes. The main conference days are preceded with various summits, including the PostgreSQL Clustering Summit, the Infrastructure Team meeting, and the Developer Meeting. The latter consists of a closed summit of 18 to 25 top code contributors to PostgreSQL who coordinate feature development and plans for the next version (PostgreSQL 9.4).

Parallel query

The longest and most interesting discussion at the developer meeting was about adding parallel query capabilities to PostgreSQL. Currently, the database is restricted to using one process and one core to execute each individual query, and cannot make use of additional cores to speed up CPU-bound tasks. While it can execute dozens of separate queries simultaneously, the lack of individual-query parallelism is still very limiting for users of analytics applications, for which PostgreSQL is already popular.

Bruce Momjian announced on behalf of EnterpriseDB that its engineering team would be focusing on parallelism for the next version of PostgreSQL. Noah Misch will be leading this project. The project plans to have index building parallelized for 9.4, but most of its work will be creating a general framework for parallelism. According to Momjian, there are three things you need for any parallel operation in the database:

  • An efficient way of passing data to the parallel backends, probably using a shared memory facility.
  • A method for starting and stopping worker processes.
  • The ability for the worker processes to share the reference data and state information of parent process.

The EnterpriseDB team had explored using threads for worker processes, but these were not seen as a productive approach, primarily because the PostgreSQL development team is used to working with processes and the backend code is structured around them. While the cost of starting up processes is high compared to threads, the additional locking required for threading looked to be just as expensive in performance terms. Momjian put it this way:

With threads, everything is shared by default and you have to take specific steps not to share. With processes, everything is unshared by default, and you have to specifically share things. The process model and explicit sharing is a shorter path from where we are currently.

The PostgreSQL developers plan to build a general framework for parallelism and then work on parallelizing one specific database task at a time. The first parallel feature is expected to be building indexes in parallel using parallel in-memory sort. This is an important feature for users because building indexes is often slower than populating the the underlying table, and it is often CPU-bound. It's also seen as a good first task because index builds run for minutes rather than milliseconds, so optimizing worker startup costs can be postponed until later development.

PostgreSQL and non-volatile memory

Major contributor KaiGai Kohei of NEC brought up the recent emergence of non-volatile memory (NVRAM), or persistent memory, devices and discussed ideas on how to take advantage of them for PostgreSQL. Intel engineer Matthew Wilcox further reinforced the message that NVRAM was coming in his lightning talk on the first day of pgCon. NVRAM persists its contents after a system power cycle, but is addressed like main memory and is around 50% as fast.

Initially, Kohei is interested in using NVRAM for the PostgreSQL Write Ahead Log (WAL), an append-only set of files that is used to guarantee transactional integrity and crash safety. This will work with the small sizes and limited write cycles of the early NVRAM devices. For servers with NVRAM, WAL writes would go to a memory region on the device allocated using mmap(). In later generations of NVRAM, developers can look at using it for the main database files.

There are many unknowns about this technology, such as what method can be employed to guarantee absolute write ordering. Developers speculated about whether transactional memory could somehow be employed for this. Right now, the PostgreSQL community is waiting to get its collective hands on an NVRAM device for testing and development.

Disqus keynote

Mike Clarke, Operations Lead of Disqus, delivered the keynote for pgCon this year. Disqus is the leading comment-hosting platform, which is used extensively by blogs and news sites all over the internet. Its technology platform includes Python, Django, RabbitMQ, Cassandra and PostgreSQL.

Most of its over three terabytes of data, including comments, threads, forums, and user profiles, is stored in PostgreSQL. This adds up to over 50,000 writes per second to the database, and millions of new posts and threads a day.

Clarke extolled the virtues of SSD storage. Disqus uses a 6-node master-slave replication database cluster running on fast machines with RAIDed SSD storage. Only SSDs have allowed it to continue scaling to its current size. Prior to moving to SSD-based storage, Disqus was at 100% IO utilization and had continual problems with long IO wait times. Now utilization is down and wait times are around one millisecond.

He also complained about some of the pain points of scaling out PostgreSQL. Disqus uses Slony-I, PostgreSQL's older replication system, which the company has customized to its workload, and feels that it can't afford to upgrade. For that reason, Clarke is eagerly awaiting the new logical replication system expected with PostgreSQL 9.4 next year. He also was unhappy about the lack of standard design patterns for PostgreSQL proxying and failover; everyone seems to build their stack differently. On the other hand, he praised extensions as the best feature of PostgreSQL, since they allow building applications inside the database.

Clarke ended with a request for some additional PostgreSQL features. He wants tools to enable sharded multiserver databases to be built inside PostgreSQL more easily, such as by improving PL/Proxy, the distributed table interface extension for PostgreSQL introduced by Skype. He'd also like to see a query progress indicator, something that was later presented at pgCon by Jan Urbański.

HStore and the future of JSON

During the regular talks, Oleg Bartunov and Teodor Sigaev introduced the prototype of the next version of their "hstore" extension for PostgreSQL. hstore allows storing a simple key-value store, a "hash" or "dictionary", in a PostgreSQL field, and to allow indexing the keys. Today, many users of PostgreSQL and JSON use it to store "flattened" JSON objects so that they can be indexed on all keys. The presentation introduced a new version of hstore which can nest, as well as storing arrays, so that it will be a closer match for fully structured JSON, as well as for complex multi-level hashes and dictionaries in Perl and Python.

This prototype new hstore also supports indexing which enables very fast lookup of keys, values, and even document fragments many levels deep. In their tests, they used the Del.icio.us dataset, which includes 1.2 million bookmark documents, and were able to search out all values matching a complex nesting expression in 0.1 seconds, or all instances of a common key in 0.5 seconds. The indexes are also reasonably sized, at around 75% of the size of the data to which they are attached. Earlier attempts to index tree-structured text data in PostgreSQL and other databases have resulted in indexes which are significantly larger than the base table. Individual hstore fields can be up to 512MB in size.

While many attendees were excited and impressed by the prototype, some were unhappy. Several contributors were upset that the new type wasn't JSON. They argued that the PostgreSQL project didn't need a non-standard type and interface, when what developers want is a binary, indexed JSON type. After extensive discussion, Bartunov and Sigaev agreed to work on JSON either instead of, or in addition to, a new hstore for the next version.

Hopefully, this means that users can expect a JSON type for version 9.4 that supports arbitrary nested key lookup and complex search expressions. This would make PostgreSQL more suitable for for applications which currently use a JSON document database, such as MongoDB or CouchDB. With the addition of compatibility projects like Mongres, users might even be able to run such applications largely unaltered.

Pluggable storage

The final day of pgCon this year was the conference's first-ever "unconference day". An unconference is a meeting in which the attendees select sessions and compose the schedule at the event. Unconferences tend to be more discussion-oriented than regular conferences, and center more around recent events and ideas. Around 75 of the pgCon attendees stayed for the unconference.

One of the biggest topics discussed at the unconference was idea of making PostgreSQL's storage "pluggable". Historically, companies have wanted to tailor the database for particular workloads by adding support for column store tables, clustered storage, graphs, streaming data, or other special-purpose data structures. These changes have created incompatible forks of PostgreSQL, such as Greenplum or Vertica, cutting off any development collaboration with those vendors. Other companies, such as Huawei and Salesforce, who are newly involved in PostgreSQL, would like to be able to change the storage model without forking the code.

The PostgreSQL contributors discussed methods of accomplishing this. First, they discussed the possibility of using the Foreign Data Wrapper (FDW) facility to attach new storage types. Foreign Data Wrappers allow users to attach external data, such as other databases, through a table interface. After some discussion, this was seen as unsuitable in most cases, since users want to actually manage tables, including creation, backup, and replication, through PostgreSQL, not just "have a window" into them. They also want to support creating indexes on different storage types.

If FDW won't work, the developers will need to create a new set of hooks and an API for "storage managers". This was actually supported by early versions of POSTGRES at the University of California, which had prototypes of both an in-memory and a write-once media (WORM) storage manager. However, that code has atrophied and doesn't support most current PostgreSQL features.

For any potential storage, the storage manager would need to support several conventions of PostgreSQL, including:

  • having tuples (rows) which are structured like PostgreSQL's tuples, including metadata
  • being transactional
  • providing a method for resolving data visibility
  • providing a physical row identifier for index building

The PostgreSQL system catalogs would stay on the current conventional native storage, regardless of what new types of storage managers were added.

If implemented, this would be a major change to the database system. It would become possible to use PostgreSQL as a query engine, transaction manager, and interface for very different types of databases, both proprietary and open source. It might even become possible for MySQL "storage engine" vendors, such as Infobright and Tokutek, to port their products. Peter van Hardenberg of Heroku suggested it might also make it possible to run PostgreSQL on top of HDFS.

Committer changes

The most frequently quoted news from pgCon this year was news that Tom Lane, lead committer on PostgreSQL, was changing employers from Red Hat to Salesforce. While announced in a rather low-key way through the Developer Meeting notes and Lane's show badge, this was big enough news that Wired picked it up. Lane had worked at Red Hat for 11 years, having joined to support Red Hat Database, its distribution of PostgreSQL. While Red Hat Database was eventually canceled, Lane stayed on at Red Hat, which was very supportive of his contributions to the project.

Lane's move is more significant in what it says about Salesforce's commitment to PostgreSQL than any real change in his expected activities as a committer. Until now, most commentators have suggested that Salesforce's mentions of PostgreSQL were merely posturing, but hiring Lane suggests that it plans to follow through on migrating away from Oracle Database. Six other Salesforce staff also attended pgCon. Its exact plans were not shared with the community, although it's reasonable to hypothesize from development discussions at the conference that Salesforce plans to contribute substantially to the open-source project, and that pluggable storage is a development target.

Lane memorialized his change of employment by putting his nine-year-old Red Hat laptop bag into the charity auction at the end of pgCon. It sold for $170.

The PostgreSQL Core Team, a six-member steering committee for the project, announced the selection of four new committers to PostgreSQL: Jeff Davis of Aster Data, author of the range types feature in version 9.2; Fujii Masao of NTT Data, main author of the synchronous replication feature; Stephen Frost of Resonate, author of several security features; and Noah Misch of EnterpriseDB, author of numerous SQL improvements. This brings the number of committers on PostgreSQL to twenty.

More PostgreSQL

Of course, there were many other interesting presentations and talks at pgCon. Keith Paskett ran a tutorial on optimizing and using PostgreSQL on ZFS atop OmniOS (an OpenSolaris fork), while other users talked about using PostgreSQL on ZFS for Linux. Jeff Davis presented strategies to use PostgreSQL's new anti-disk-corruption features. Josh McDermott ran another Schemaverse tournament, as a qualifier for the upcoming Defcon tournament. Robert Haas showed the most common failures of the PostgreSQL query planner, and sparked discussion about how to fix them.

On the second full conference day, Japanese community members presented the newly-formed PostgreSQL Enterprise Consortium of Japan, a group of 39 Japanese companies aiming to promote and improve PostgreSQL. This group is currently working on clustered PostgreSQL, benchmarking, and migration tools to migrate from other database systems. And just for fun, Álvaro Hernández Tortosa demonstrated creating one billion tables in a single PostgreSQL database.

Overall, it was the most exciting pgCon I've attended, and shows the many new directions in which PostgreSQL development is going. Anyone there got the impression that the project would be completely reinventing the database within a few years. If you work with PostgreSQL, or are interested in contributing to it, you should consider attending next year.

[ Josh Berkus is a member of the PostgreSQL Core Team. ]


(Log in to post comments)

A report from pgCon 2013

Posted Jun 11, 2013 7:24 UTC (Tue) by tpo (subscriber, #25713) [Link]

Very nice and very interesting article. Thanks a lot!

A report from pgCon 2013

Posted Jun 11, 2013 19:38 UTC (Tue) by dskoll (subscriber, #1630) [Link]

I went to PgCon 2013 and I was somewhat disappointed. I didn't find it as informative as previous years.

I was quite disappointed in the Disqus keynote as it was very light on details. Basically, "we scale our database by using kick-ass hardware" isn't particularly enlightening.

Still, there were a few interesting/useful sessions and PostgreSQL itself rocks.

A report from pgCon 2013

Posted Jun 13, 2013 16:55 UTC (Thu) by jberkus (subscriber, #55561) [Link]

I'm sorry to hear that. If you have a chance, could you please rate the sessions you attended on the pgcon web site? Your ratings will help them pick talks for next year.

A report from pgCon 2013

Posted Jun 13, 2013 19:44 UTC (Thu) by dskoll (subscriber, #1630) [Link]

OK, will do. Let me add that I might have found Pg2013 less useful than in past years because I've been using PostgreSQL for a lot longer than when I first attended. So I know more about it and am less likely to glean amazing insights from talks today than I would have a few years ago. So my comments are, of course, very subjective.

A report from pgCon 2013

Posted Jun 12, 2013 8:58 UTC (Wed) by dune73 (subscriber, #17225) [Link]

Go pg, go!

A report from pgCon 2013

Posted Jun 13, 2013 12:24 UTC (Thu) by marcH (subscriber, #57642) [Link]

> With threads, everything is shared by default and you have to take specific steps not to share. With processes, everything is unshared by default, and you have to specifically share things. The process model and explicit sharing is a shorter path from where we are currently.

I am surprised this was not mentioned too:
"With threads, everything is unsafe by default and you have to take specific steps to be safe. With processes, everything is safe by default, and you have to specifically share things."

For more see the Parallelism Manifesto, "Threads are evil", transactional memory, etc.

As an interesting "coincidence", one of PG's key features is snapshot isolation.


Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds