Changes coming in PostgreSQL 17

By Daroc Alden
August 13, 2024

The PostgreSQL project has released beta versions of PostgreSQL 17 containing several interesting security and usability improvements, alongside the usual performance improvements and bug fixes. If the release proceeds according to the usual timeline, the full release of version 17 is expected in September or October. The most important changes are in what PostgreSQL does when a database supervisor has their credentials revoked, and added support for incremental database backups.

Security

PostgreSQL 17 includes a few changes that could affect compatibility, in the name of security. Most noticeable is a change by Jeff Davis to how functions invoked during maintenance operations locate objects in the database. Normally, when an SQL statement refers to a table, data type, function, or other object by a name, PostgreSQL determines which object is being referred to using a search path that defines the order in which schemas are searched. This presents a possible security risk: if someone could influence the contents of the search path, they could make a privileged database operation call a function of their design with the same name as some built-in function. So, in PostgreSQL 17, privileged maintenance operations will default to using a safe search path — either the default, which should be fine for most users, or a search path declared during the creation of a function, so that the function author can be sure that each name will always refer to the same object.

That isn't the only change that makes running maintenance operations safer, however. Nathan Bossart added a new permission, MAINTAIN, that can be granted per-table or per-role. This means that now maintenance operations don't need general administration privileges, and the permissions can be more narrowly scoped. Unlike the search-path change, this change is backward compatible — although it may prompt some administrators to restructure their permission systems.

Another security-related change is a tweak to how PostgreSQL handles the SET SESSION AUTHORIZATION command that allows superusers to temporarily act as other users — the database equivalent of su. In previous versions, anyone who was a superuser at the time their database connection opened could issue the command. This led to the theoretical problem of someone opening a privileged connection, losing their superuser status, and then using their open connection to get it back. Of course, the far more common case is that of a new administrator having to close and reopen their connection to be able to use it. In PostgreSQL 17, Joseph Koshakow fixed this behavior and the command only cares about a user's superuser status at the time it is issued.

A similar issue affecting logical replicas has also been fixed by Vignesh C. If the user who owns a logical replication subscription has their superuser access revoked, the process that applies the updates for logical replication, the apply worker, will now be automatically restarted, instead of continuing to run with superuser-level access.

Administration

There is also one feature that "must not be regarded as a security feature" — a configuration option to disable ALTER SYSTEM that LWN covered in April, by Jelte Fennema-Nio and Gabriele Bartolini. The feature is a boon for system administrators, however, because it allows them to ensure that PostgreSQL is only configured via files, not via ad-hoc invocations of ALTER SYSTEM.

Administrators may also enjoy the addition of incremental backups to the pg_basebackup tool by Robert Haas, Jakub Wartak, and Tomas Vondra. Incremental backups are likely to be faster than full backups for most databases, since they only need to record pages of the database that have changed. Which pages have changed is determined using the database's write-ahead log (WAL) — the log of database operations that is used to ensure that writes to the database are durable. A new background process reads the WAL files and produces a summary of which pages have changed. The administrator can configure its behavior using some new settings. Incremental backups may not be possible if something is slowing that process down too much (such as the database dealing with too many writes).

Other complications include the fact that PostgreSQL does not track the relationship between incremental backups — that responsibility is left to the server administrator. They must retain the last full backup, and all of the incremental backups made after it, and the WAL segment and history files that must be retained for a normal backup as well. This complexity is somewhat nerve-racking to see in a backup system, but the project does provide pg_combinebackup, a utility that can turn a full backup and a series of incremental backups into a full backup. So one possible workflow would be to take incremental backups, and use them to update an existing full backup offline, without impacting the operation of the database.

Finally, PostgreSQL 17 adds some additional options to the COPY FROM command and EXPLAIN queries. The COPY FROM command allows users to import data from a file into a table. It now has the option to discard rows that would cause an error (such as failing to parse a given type, or causing a constraint violation), as well as options to increase the verbosity of what is logged about the copying process. These changes required more work than one might assume; the final version included in the release ended up including changes from Damir Belyalov, Atsushi Torikoshi, Alex Shulgin, Jian He, Yugo Nagata, and Bharath Rupireddy.

EXPLAIN can now be asked to report the memory used by the SQL-query optimizer during execution of a query, and to separately report the cost of converting data for transmission over the network. Its output has also been cleaned up slightly, which should hopefully make EXPLAIN results easier to read. This work also involved contributions from many people: Ashutosh Bapat, Stepan Rutz, Matthias van de Meent, Nazir Bilal Yavuz, Tom Lane, and Dean Rasheed.

Optimizations

Databases are usually made more performant in one of two ways: optimizing the basic operations of the database, or making the SQL-query planner smarter, so that the database can do fewer basic operations in the first place. PostgreSQL 17 includes both kinds of optimization. Vondra and Van de Meent made it so that block-range indexes can be built in parallel; Peter Geoghegan and Van de Meent made using B-Tree indexes more efficient when looking up groups of values.

Vacuuming the database — removing unused rows and pages from the middle of a table — has also become much more efficient. Melanie Plageman, Heikki Linnakangas, Masahiko Sawada, John Naylor, and Thomas Munro all contributed various fixes that should speed up vacuuming in one way or another. The amount of memory used during vacuuming may go up slightly, because the default buffer size has been increased.

Sometimes developers will use object-relational mappers, or other abstractions that sit on top of the database. Those abstractions can introduce redundancies in their SQL queries that a knowledgeable human is unlikely to write by hand. Some of the improvements to the optimizer in PostgreSQL 17 are aimed at removing those redundancies. For example, David Rowley, Richard Guo, and Andy Fan made it so that IS NOT NULL and IS NULL checks can be automatically removed from queries when a column is known to be NOT NULL. Rowley also added partition pruning for IS UNKNOWN and IS NOT UNKNOWN conditions on boolean columns.

The other improvements are more generally applicable, and include things such as allowing for greater parallelism, allowing some kinds of sub-queries to be converted to joins, and allowing the optimizer to consider the statistics of columns referenced in previous clauses. Overall, users of PostgreSQL should expect to see some modest speedups on queries where the new optimizations apply. The project's recommendations for getting good performance from a database remain the same — primarily, that one should start by running their queries through EXPLAIN to see what steps the database will actually have to go through to answer the query.

Overall, PostgreSQL 17 is more of the same from the PostgreSQL project — a large release with improvements to performance, maintenance, and security, that nonetheless preserves backward compatibility in most cases. Information on where to download the beta release can be found on the PostgreSQL snapshots page. The project does not recommend upgrading production servers yet, but does recommend installing the beta release in non-production environments to test it and ensure there are no lurking regressions.

Query planners are still dumb

Posted Aug 13, 2024 15:30 UTC (Tue) by cen (subscriber, #170575) [Link] (6 responses)

I always wondered why planners don't perform some well known query optimizations like transforming pagination queries to deferred joins automatically.

Query planners are still dumb

Posted Aug 14, 2024 7:24 UTC (Wed) by Sesse (subscriber, #53779) [Link] (5 responses)

Having worked on an SQL planner and being fairly familiar with another, my response to this “well known” optimization is: What?

Query planners are still dumb

Posted Aug 14, 2024 10:33 UTC (Wed) by cen (subscriber, #170575) [Link] (4 responses)

search for "sql deferred join", it's a technique which can drastically improve performance of pagination queries

Query planners are still dumb

Posted Aug 14, 2024 10:39 UTC (Wed) by Sesse (subscriber, #53779) [Link]

From a quick search, this seems to be a rewrite of LIMIT/OFFSET queries (which you should very rarely use anyway if you care about performance; use keyset pagination!). I found a few variations, all of which were one or more of:

- Giving wrong results
- Completely bogus optimization-wise
- Solved better on a different layer of the database
- Only relevant to old versions of certain databases

So I don't think you'll see this happen in most databases anytime soon. Remember that a query optimizer can only do things that are guaranteed to give the correct result, and they need to be helpful rather than hurtful in the vast majority of cases, and finally, it needs to be a case that is common enough (ideally, in non-broken code) to warrant adding extra logic for.

Query planners are still dumb

Posted Sep 27, 2024 7:33 UTC (Fri) by ringerc (subscriber, #3071) [Link]

If you want this, use a refcursor, and WITH HOLD if needed.

That way your pagination will actually be correct, at the cost of extra DB resources held busy during user think-time.

I'm not sure quite what you're proposing for a way for the DB to help with pagination w/o keeping a query context around. PostgreSQL already does its best to re-order joins and push down filters to select the desired subset of data from a query, but this is generally not feasible if you're using LIMIT and OFFSET for pagination rather than key-ranges.

Query planners are still dumb

Posted Sep 27, 2024 7:41 UTC (Fri) by ringerc (subscriber, #3071) [Link] (1 responses)

As far as I can tell what you are asking for is for the DB to delay fetching of non-key row data until it knows the data won't be discarded by a LIMIT and OFFSET clause.

This doesn't really make sense with PostgreSQL's heap table structure. You have to read the row to get the key and ensure the row pointed to by the b-tree index is visible, even in an index scan.

The only case this strategy would be likely to be useful at all is where an index-only scan can be performed due to an up-to-date visibility map, and where the query is a trivial ORDER BY key_col LIMIT ... OFFSET ... such that the total ordering is completely dependent on that one index.

In that case I'm not sure if PostgreSQL will skip fetching the heap rows skipped by an OFFSET clause, and I agree it probably could if it doesn't. It generally will skip the much more expensive operation of fetching any external TOASTed data though.

LIMIT / OFFSET pagination is kind of terrible anyway. In most cases where pagination is unavoidable, it's better to use a predicate filter on the ordered attribute, e.g. `SELECT ... FROM sometable WHERE sortkey > $1 ORDER BY sortkey LIMIT 50` - where "$1" is the value of the last entry in the previously-seen page. Not only does this generally execute much more efficiently, but its results won't skip rows if the row-set changes between page requests.

Query planners are still dumb

Posted Oct 1, 2024 13:08 UTC (Tue) by raven667 (subscriber, #5198) [Link]

> `SELECT ... FROM sometable WHERE sortkey > $1 ORDER BY sortkey LIMIT 50` - where "$1" is the value of the last entry in the previously-seen page

I hadn't thought of that optimization before, I usually skip pagination and just dump whatever the result set is to the client and let them choke on it, but I'll try and keep that technique in my back pocket because that looks useful. I do use the Netbox REST API a lot which is a Django app, and have to deal with the performance impact of paginated queries, and I wonder if this was integrated into the Django ORM for those queries if that would make any difference, or if all the overhead is in JSON serialization and round trips between pages. GraphQL seems faster but I don't understand it at all so it's not that useful in practice, when I don't know how to do what I want.

Temporary changes?

Posted Aug 14, 2024 7:47 UTC (Wed) by taladar (subscriber, #68407) [Link] (3 responses)

> The feature is a boon for system administrators, however, because it allows them to ensure that PostgreSQL is only configured via files, not via ad-hoc invocations of ALTER SYSTEM.

While this is a nice sentiment it would be nice if it was accompanied by some other way to temporarily enable certain logging features then. Sometimes you just need to raise the log level for a minute or enable slow query logging temporarily without restarting the entire container stack and currently PostgreSQL unfortunately requires config file changes for this, unlike a lot of other systems that allow you to change the in memory values temporarily without changing the config file.

Temporary changes?

Posted Aug 14, 2024 12:36 UTC (Wed) by kleptog (subscriber, #1183) [Link] (1 responses)

Well, there's still ALTER DATABASE [...] SET and ALTER USER [...] SET. You can't change global options this way, but several related to logging are possible i.e. log_min_duration_statement and log_min_duration_sample.

Temporary changes?

Posted Aug 18, 2024 16:31 UTC (Sun) by intgr (subscriber, #39733) [Link]

Also just SET setting='value'; still works to change settings for the current session only.

Temporary changes?

Posted Sep 27, 2024 7:42 UTC (Fri) by ringerc (subscriber, #3071) [Link]

ALTER DATABASE ... SET

ALTER ROLE ... SET

session-level SET

transaction-level

BEGIN; SET LOCAL ... ;

This only guards against global change.