Declarative partitioning in PostgreSQL
Keith Fiske gave a talk (with slides) about the state of partitioning — splitting a large table into smaller tables for performance reasons — in PostgreSQL at SCALE this year. He spoke about the existing support for partitioning, what work still needs to be done, and what place existing partitioning tools, like his own pg_partman, still have as PostgreSQL gains more built-in features.
Partitioning in a database context is when a table is split into multiple smaller tables that each are part of the same logical relation, but contain a smaller physical portion of the data. There are several reasons why someone might want to partition their database, but the most common reasons are to make it easier to manage large amounts of data and to allow databases to reclaim disk space.
![Kieth Fiske [Keith Fiske]](https://static.lwn.net/images/2024/Keith_Fiske-small.png)
In PostgreSQL, deleting a large number of rows does not immediately return the disk space to the filesystem. Instead, those rows are only marked as deleted — because other transactions may still need to read the data. A row can only be reused, or have the backing storage returned to the filesystem, once all the transactions using the row have finished. Even then, only completely unused blocks at the end of a table are freed, so it is quite possible to delete every row in a table except one and see no space reclaimed. On the other hand, dropping an entire table gives the space that it uses back immediately, and is much faster than deleting all the rows involved individually. Therefore, splitting data over multiple smaller tables can make dropping old data that is no longer needed faster. Fiske called this the "primary reason for partitioning in Postgres".
Keeping more, smaller tables has other benefits as well. Several operations take time proportional to the size of a table, such as full table scans when running an SQL query, or the VACUUM process that actually does the work of freeing unused rows for reuse. Having smaller tables reduces the latency of performing a VACUUM, but if some of those tables are not changing, it also reduces the overall time that vacuuming the entire database takes.
The old way
The old way to partition data (prior to PostgreSQL version 10, which was released in 2017), was to manually partition data using table inheritance, triggers, and constraints. Table inheritance is a feature of PostgreSQL that allows one table to inherit from another in a manner similar to object-oriented inheritance, extending the schema of the base table. Queries made against the base table can also return rows from the child table.
But partitioning data in this way was a lot of repetitious work. Fiske said that he "was tired of writing the same thing all the time, for the fourth or fifth customer." Finally, PostgreSQL introduced a new version of partitioning that is built into the database. Fisk clarified that "there may be some narrow use cases where you need to use the old method of partitioning", but he was eager to see any features needed to cover those use cases added to the software.
The new way
The new way to handle data partitioning in PostgreSQL is with declarative partitioning. This lets users set up partitioned tables using data-definition language (DDL) statements:
CREATE TABLE measurement ( city_id int not null, logdate date not null, peaktemp int, unitsales int ) PARTITION BY RANGE (logdate); CREATE TABLE measurement_y2006m02 PARTITION OF measurement FOR VALUES FROM ('2006-02-01') TO ('2006-03-01'); CREATE TABLE measurement_y2006m03 PARTITION OF measurement FOR VALUES FROM ('2006-03-01') TO ('2006-04-01'); ...
There are three kinds of partitioning now supported by PostgreSQL: range, list, and hash partitioning. Shown above is an example of range partitioning. Fisk stated that ranges can be used with "anything you can define an upper or lower boundary for", such as integers, dates, or GUIDs (Globally Unique Identifiers). Each range is inclusive of its lower bound and exclusive of its upper bound, ensuring that edge cases can be sorted appropriately. List-based partitioning involves enumerating which values of a key appear in each partition: "telling it explicitly what goes in a different table".
Hash partitioning works by giving a modulus and remainder that are used to sort rows into child tables based on the hash of selected columns. Fiske said that there were two use cases for hash partitioning. Either "data growing evenly", or "you don't know what your data distribution is going to be in advance". He pleaded with the audience not to use hash partitioning if they do know what the distribution of the data will be.
Hash partitioning has a few drawbacks compared to other methods. For example, "in hash partitioning you cannot add or remove a child without rebuilding" the partitioned table and all its children. In addition, data in a hash-partitioned table "often becomes very unbalanced unless your data is actually random".
PostgreSQL supports redirecting inserted rows from the parent table into the appropriate child table. The feature also handles updates — as of PostgreSQL 11 in 2018, if an update changes which child table a row should go in, it is seamlessly migrated. Support for partitioned tables also extends to query optimizations inside PostgreSQL's query planner. Specifically, when the WHERE clause of a query references the column on which the table is partitioned, the query planner can completely exclude child tables that are not relevant to the query.
Coming soon
Fiske thinks that the implementation of partitioning in PostgreSQL is nearly done. "There's not a lot left, really, as far as base partitioning features". He detailed two pain points that he did hope would be addressed: the need to improve query performance, which is "always going to be there, forever", but is "actively being worked on", and the problem of global indexes.
Indexes are used not only to speed up queries, but also to maintain uniqueness constraints. Right now, the way indexes are designed means that they can only cover a single table. Partitioned tables can have indexes for the columns that are used as partition keys, but otherwise the fact that they are split up into multiple tables makes it impossible to specify uniqueness constraints. Adding global indexes would be useful not just for partitioning, but also for creating other complex cross-table constraints. Fiske said that this would "still take a while to support", but that he doesn't "know enough about the backend" to say what the exact challenges with implementing it are.
There are a few other warts with partitioning. Some properties — such as whether a table uses a write-ahead log — are not inherited properly. Also, there is a bug that has yet to be tracked down, where dropping child tables when foreign keys reference them results in the entire foreign key constraint being dropped.
Despite this, Fiske says partitioning is now a first-class feature in PostgreSQL that is "mostly feature complete". He advised the audience to "tune your queries first" before resorting to partitioning, but that partitioning was a big performance boon to certain use cases.
The place of pg_partman
Fiske is qualified to talk about PostgreSQL partitioning because he has maintained his own PostgreSQL extension to make partitioning easier for many years. pg_partman is an extension that performs many maintenance tasks associated with partitioning, which is provided under the PostgreSQL license.
Using a trigger to create new child tables when required — a common approach, Fiske said — is convenient, but "for a very very busy table, making tables on demand can cause a huge backlog". Making a new child table involves taking a lock on the parent, which can cause delays when many transactions try to make the new table at the same time. pg_partman supports a configurable system for creating new child tables before they're needed, avoiding that contention.
It also includes utilities for partitioning existing tables, for managing tables that would have their names truncated (to PostgreSQL's 63-character limit), and many other details relevant to partitioning. One interesting feature is the ability to scan a child table that has stopped being updated, and automatically create constraints based on what actual data is present in the table, which "dramatically increases performance" of queries made against large tables.
Fiske is still updating pg_partman; version 5.1 should be released by the end of March, and will support proactive child-table creation for list partitions with single-ID values. Despite this, Fiske said that one thing he wanted people to come away with was the understanding that he would much rather these features make their way into PostgreSQL itself. "Please make my extension obsolete," he requested. To that end, Fiske was happy to see a core PostgreSQL contributor in the audience. He also called the code in his extension a kludge, and said that unfortunately the code was not in a fit state to be accepted into core PostgreSQL as-is.
While there remain some small problems to work on, it seems clear that PostgreSQL's support for table partitioning is ready to see more widespread use. The project's development strategy means that additional performance and usability improvements to partitioning are likely to trickle in over time, but PostgreSQL 16, the current supported version, is already quite capable.
[I would like to thank LWN's travel sponsor, the Linux Foundation, for travel assistance to Pasadena for SCALE.]
Index entries for this article | |
---|---|
Conference | Southern California Linux Expo/2024 |
Posted Mar 29, 2024 6:28 UTC (Fri)
by mirabilos (subscriber, #84359)
[Link]
As to the names… ouch, yes, hadn’t thought about that. Is it possible to make it so all child tables go into a separate schema (one schema per parent table) as kind of namespacing?
Posted Apr 11, 2024 22:18 UTC (Thu)
by grifferz (subscriber, #62128)
[Link] (1 responses)
Posted Apr 12, 2024 13:06 UTC (Fri)
by daroc (editor, #160859)
[Link]
Declarative partitioning in PostgreSQL
Declarative partitioning in PostgreSQL
Declarative partitioning in PostgreSQL