|
|
Subscribe / Log in / New account

PostgreSQL reconsiders its process-based model

[LWN subscriber-only content]

Welcome to LWN.net

The following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider accepting the trial offer on the right. Thank you for visiting LWN.net!

Free trial subscription

Try LWN for free for 1 month: no payment or credit card required. Activate your trial subscription now and see why thousands of readers subscribe to LWN.net.

By Jonathan Corbet
June 19, 2023
In the fast-moving open-source world, programs can come and go quickly; a tool that has many users today can easily be eclipsed by something better next week. Even in this environment, though, some programs endure for a long time. As an example, consider the PostgreSQL database system, which traces its history back to 1986. Making fundamental changes to a large code base with that much history is never an easy task. As fundamental changes go, moving PostgreSQL away from its process-oriented model is not a small one, but it is one that the project is considering seriously.

A PostgreSQL instance runs as a large set of cooperating processes, including one for each connected client. These processes communicate through a number of shared-memory regions using an elaborate library that enables the creation of complex data structures in a setting where not all processes have the same memory mapped at the same address. This model has served the project well for many years, but the world has changed a lot over the history of this project. As a result, PostgreSQL developers are increasingly thinking that it may be time to make a change.

A proposal

At the beginning of June, Heikki Linnakangas, seemingly following up on some in-person conference discussions, posted a proposal to move PostgreSQL to a threaded model.

I feel that there is now pretty strong consensus that it would be a good thing, more so than before. Lots of work to get there, and lots of details to be hashed out, but no objections to the idea at a high level.

The purpose of this email is to make that silent consensus explicit.

The message gave a quick overview of some of the challenges involved in making such a move, and acknowledged, in an understated way, that this transition "surely cannot be done fully in one release". One thing that was missing was a discussion of why this big change would be desirable, but that was filled in as the discussion went on. As Andres Freund put it:

I think we're starting to hit quite a few limits related to the process model, particularly on bigger machines. The overhead of cross-process context switches is inherently higher than switching between threads in the same process - and my suspicion is that that overhead will continue to increase. Once you have a significant number of connections we end up spending a *lot* of time in TLB misses, and that's inherent to the process model, because you can't share the TLB across processes.

He also pointed out that the process model imposes costs on development, forcing the project to maintain a lot of duplicated code, including several memory-management mechanisms that would be unneeded in a single address space. In a later message he also added that it would be possible to share state more efficiently between threads, since they all run within the same address space.

The reaction of some developers, though, made it clear that the "pretty strong consensus" cited by Linnakangas might not be quite that strong after all. Tom Lane said: "I think this will be a disaster. There is far too much code that will get broken". He added later that the cost of this change would be "enormous", it would create "more than one security-grade bug", and that the benefits would not justify the cost. Jonathan Katz suggested that there might be other work that should have a higher priority. Others worried that losing the isolation provided by separate processes could make the system less robust overall.

Still, many PostgreSQL developers seem to be cautiously in favor of at least exploring this change. Robert Haas said that PostgreSQL does not scale well on larger systems, mostly as a result of the resources consumed by all of those processes. "Not all databases have this problem, and PostgreSQL isn't going to be able to stop having it without some kind of major architectural change". Just switching to threads might not be enough, he said, but he suggested that this change would enable a number of other improvements.

How to get there

Moving the core of the PostgreSQL server into a single address space will certainly present a number of challenges. The biggest one, as pointed out by Haas and others, would appear to be the server's "widespread and often gratuitous use of global variables". Globals work well enough when each server process has its own set, but that approach clearly falls apart when threads are used instead. According to Konstantin Knizhnik, there are about 2,000 such variables currently used by the PostgreSQL server.

A couple of approaches to this problem were discussed. One was pulling all of the global variables into a big "session state" structure that would be thread-local. That idea quickly loses its appeal, though, when one considers trying to create and maintain a 2,000-member structure, so the project is unlikely to go this way. The alternative is to simply throw all of the globals into thread-local storage, an approach that is easy and would work, but heavy use of thread-local storage would exact a performance penalty that would reduce the benefits of the switch to threads in the first place. Haas said that marking globals specially (to put them into thread-local storage, among other things) would be a beneficial project in its own right, as that would be a good first step in reducing their use. Freund agreed, saying that this effort would pay off even if the switch to threads never happens.

But, Freund cautioned, moving global variables to thread-local storage is the easiest part of the job:

Redesigning postmaster, defining how to deal with extension libraries, extension compatibility, developing tools to make developing a threaded postgres feasible, dealing with freeing session lifetime memory allocations that previously were freed via process exit, making the change realistically reviewable, portability are all much harder.

An interesting point that received surprisingly little attention in the discussion is that Knizhnik has already done a threads port of PostgreSQL. The global-variable problem, he said, was not that difficult. He had more trouble with configuration data, error handling, signals, and the like. Support for externally maintained extensions will be a challenge. Still, he saw some significant benefits in working in the threaded environment. Anybody who is thinking about taking on this project would be well advised to look closely at this work as a first step.

Another complication that the PostgreSQL developers have in mind is that of supporting both the process-based and thread-based modes, perhaps indefinitely. The need to continue to support running in the process-based mode would make it harder to take advantage of some of the benefits offered by threads, and would significantly increase the maintenance burden overall. Haas, though, is not convinced that it would ever be possible to remove support for the process-based mode. Threads might not perform better for all use cases, or some important extensions may never gain support for running in threads. The removal of process support is, as he noted, a question that can only really be considered once threads are working well.

That point is, obviously, a long way into the future, assuming it arrives at all. While the outcome of the discussion suggests that most PostgreSQL developers think that this change is good in the abstract, there are also clearly concerns about how it would work in practice. And, perhaps more importantly, nobody has, yet, stepped up to say that they would be willing to put in the time to push this effort forward. Without that crucial ingredient, there will be no switch to threads in any sort of foreseeable future.


Did you like this article? Please accept our trial subscription offer to be able to see more content like it and to participate in the discussion.


(Log in to post comments)

Aim for the stars

Posted Jun 19, 2023 16:11 UTC (Mon) by Wol (subscriber, #4433) [Link]

> While the outcome of the discussion suggests that most PostgreSQL developers think that this change is good in the abstract, there are also clearly concerns about how it would work in practice.

And you might hit the moon. Aim nowhere and you're going nowhere.

Look at the GIL (was that Python?) and the Big Kernel Lock in linux. Whether you get there or not, a lot of the work on the way sounds like it's worth it in its own right. Like getting rid of all those global variables!

Even being able to break up each process into a bunch of threads for the easy stuff could lead to massive benefits - threading where it works well, processes where they work well.

I wish you all God Speed on the voyage!

Cheers,
Wol

Aim for the stars

Posted Jun 19, 2023 18:18 UTC (Mon) by zoobab (guest, #9945) [Link]

Maybe yse zeromq ipc messages between threads?

Aim for the stars

Posted Jun 20, 2023 4:44 UTC (Tue) by j16sdiz (subscriber, #57302) [Link]

ZeroMQ is a big mess when it comes to threading model and error recovery.

It do too much magic behind your back. When it comes to database, we need more explicit (or flexible) error handling.

Aim for the stars

Posted Jun 19, 2023 20:19 UTC (Mon) by nevyn (subscriber, #33129) [Link]

Python GIL and Linux Big kernel lock seem like very bad comparisons. In those cases there is/was no Parallelism, here there is Parallelism but _maybe_ the scaling is better if you change "everything" and _maybe_ the security/robustness is the same.

This is "closer" to the apache-httpd move, the main difference being I don't know enough about PostgreSQL and the plans to move to imply the outcome will be that bad.

Aim for the stars

Posted Jun 19, 2023 22:22 UTC (Mon) by Wol (subscriber, #4433) [Link]

It wasn't meant as a comparison. The Big Kernel Lock and the GIL enforced "single process". PostgreSQL *is* a single process?

Linux and Python decided that removing that restriction was worthwhile. Whether PostgreSQL succeeds or not, the effort they make towards removing that restriction may well be worthwhile.

Cheers,
Wol

Aim for the stars

Posted Jun 19, 2023 23:18 UTC (Mon) by michaelmior (guest, #165680) [Link]

Postgres scales by coordinating among multiple processes on a single machine. The proposal is to use multiple threads instead of multiple processes.

This is similar to the CPython GIL, but the GIL doesn't enforce a single process. It prevents multiple threads from running concurrently in the same process. In CPython with the GIL, multiple processes are *necessary* to scale CPU-bound code.

Aim for the stars

Posted Jun 20, 2023 4:44 UTC (Tue) by rtpg (subscriber, #114619) [Link]

I would go even further, there are a good amount of people who argue for the GIL to stay in Python ~forever, mostly because the mental model is easier and it rules out entire classes of bugs.

The GIL stuck along enough to allow for async, and so you have async for lots of parallelism in one direction, stuff like multiprocessing in the other. Even heavy calculation stuff is pretty "eh whatever" because in practice it often calls into other libraries which release the GIL.

GILectomy work has been many many many many false starts, and I think we're learning stuff from it (and it might still be the right way to go in the end!), but it's been tough to find work from those projects that end up being usable (namely because of new locking patterns needing to be figured out in the alternative)

Aim for the stars

Posted Jun 20, 2023 8:13 UTC (Tue) by NYKevin (subscriber, #129325) [Link]

Another part of the problem is the fact that CPython is "good enough."

Anyone who wants to get rid of the GIL can transpile to C with Cython, annotate any objects that need to be accessed outside the GIL as C types, and then write "with nogil:" to release the GIL. It will run much faster than CPython even if you're single-threaded, and can be done incrementally on a module-by-module basis in most cases.

The main downsides of this strategy are:

* CPython is more mature than Cython.
* CPython has a (slightly) more straightforward build process, especially if you have zero non-stdlib dependencies.
* Cython specifically requires a C compiler.
* C types are not Python types. There are semantic differences. You have to do additional testing if you're converting an existing codebase.
* C is not a terribly complicated language, but if you don't know it at all, then you probably need to learn it first.

But none of those are hard blockers. They're just friction. If you really strongly need to drop the GIL, this is a perfectly reasonable way of doing it. The fact is, most people asking for a GILectomy either haven't looked into alternatives like Cython, don't want free threading badly enough to overcome the activation energy of this strategy, or have already built a large CPU-bound multithreaded application in Python which is too big to annotate, despite the threading docs explicitly saying not to do that.

Aim for the stars

Posted Jun 20, 2023 11:56 UTC (Tue) by eru (subscriber, #2753) [Link]

> and it rules out entire classes of bugs.

Seems to me this applies nicely also to PostgressSQL processes vs threads, because of the address-space separation, and the automatic memory cleanup you get when a sub-process exits. With threads, a bug in one thread may trash the memory of any other thread.

PostgreSQL reconsiders its process-based model

Posted Jun 19, 2023 19:26 UTC (Mon) by raven667 (subscriber, #5198) [Link]

I know nothing of the PostgreSQL internals or the relevant engineering but throwing an opinion out there anyway; is there a way to make a minimal threaded implementation that just covers the necessary features needed for the most extreme large servers where threading could help? If you made a ton of caveats about what features are supportable, ie anything not used by the large instances you want test with, can you reduce the scope of what work is needed to something more manageable that can be iterated on? Steady improvement without taking on a big chunk of risk to rework the whole internal architecture, even if it takes longer, is probably the way to go for an old mature software project like this, right?

PostgreSQL reconsiders its process-based model

Posted Jun 19, 2023 19:45 UTC (Mon) by jhoblitt (subscriber, #77733) [Link]

Semi-seriously, why not port the postgresql sql dialect to use mariadb as the backend? Mariadb (mysql...) has had a robust threaded model and binary redo logs for literally decades.

PostgreSQL reconsiders its process-based model

Posted Jun 19, 2023 19:48 UTC (Mon) by pizza (subscriber, #46) [Link]

> Semi-seriously, why not port the postgresql sql dialect to use mariadb as the backend? Mariadb (mysql...) has had a robust threaded model and binary redo logs for literally decades.

Because it's not Postgresql's "dialect" that matters here, but rather the features and robustness that dialect exposes.

...Mariadb might as well be on another planet in comparison.

PostgreSQL reconsiders its process-based model

Posted Jun 19, 2023 23:19 UTC (Mon) by butlerm (subscriber, #13312) [Link]

I believe the short answer is doing that would be tantamount to the PostgreSQL project throwing away nearly everything they have done for the past couple of decades. In addition, unless MariaDB has made remarkable progress in the past few years it isn't anywhere close to implementing PostgreSQL's full feature set or in particular being able to implement those features in a backward compatible manner with PostgreSQL.

When you get down into the details relational database implementations tend to be remarkably different from each other in terms of more user level aspects (functions, data types, options, apis) than you can count. I think it is safe to say the PostgreSQL developers have not reached quite that level of desperation yet. But if someone wanted to take that on as a software engineering challenge the results would certainly be interesting to read about.

PostgreSQL reconsiders its process-based model

Posted Jun 19, 2023 20:29 UTC (Mon) by flussence (subscriber, #85566) [Link]

Oh this is quite some news. I don't mind early adopting performance features, but…

In Apache httpd I've been using every experimental threaded/event mpm as it becomes available, because the forking model always felt a bit gross to me. But that's software that has had pluggable backends for decades, and even so it's still a bit rough around the edges. I generally trust the Postgres developers to not screw up but I think this kind of change would need two or three major release cycles before I'd feel comfortable turning it on in production.

PostgreSQL reconsiders its process-based model

Posted Jun 20, 2023 11:13 UTC (Tue) by ctg (subscriber, #3459) [Link]

This is all very deja vu.

Back in the day, University Ingres (from which postgres, then postgresql is derived) went commercial with RTI. Version 6 was a major rewrite - going from the multi-process architecture to a multi-threaded one (and also switched to SQL as the "core" language). It wasn't that pretty. RTI didn't survive. Not saying the two things are linked.

One of the things I like(d) about postgresql was that it still had the original multiprocess model, still recognisable from ingres of the early 1980s.

PostgreSQL reconsiders its process-based model

Posted Jun 20, 2023 12:00 UTC (Tue) by rrolls (subscriber, #151126) [Link]

A process-per-client model makes sense when you have under a thousand connected clients and they're all coming from goodness knows where: i.e. when you definitely don't want any security bugs that expose state from one client to another, or indeed allow one client to (intentionally or otherwise) cause a denial-of-service to another.

But if you have a large number of connections coming from what is essentially the _same_ client, as we often seem to do in web services for even the simple purpose of running multiple queries at the same time, then that really shouldn't be using multiple processes.

A threaded model works, I suppose, but an event-driven model would be far more ideal. Allow each client to connect once, and give each client its own process - but then allow that client to spawn however many asynchronous tasks it wishes and receive the results incrementally, rather than blocking the whole connection for every operation and thus requiring multiple connections. IIRC, IMAP works like this.

PostgreSQL reconsiders its process-based model

Posted Jun 20, 2023 15:10 UTC (Tue) by atnot (subscriber, #124910) [Link]

This is generally why you'll see these sorts of places run an instance of e.g. pgbouncer to pool many requests over a single process. That makes significantly more effective use of your processes, but it really doesn't solve the scaling issues.

PostgreSQL reconsiders its process-based model

Posted Jun 20, 2023 20:50 UTC (Tue) by mokki (subscriber, #33200) [Link]

If TLB overhead with shared memory and locks between co-operting processes is too high, why not try to fix it in kernel?

For example, would something like opt-in sharing of pages between processes that oracle has been trying to get into kernel be the correct option: https://lwn.net/ml/linux-kernel/cover.1682453344.git.khal...

Postmaster would just share the already shared memory between processes (containing also the locks). That explicit part of memory would opt-in to thread -like sharing and thus get faster/less tlb switching and lower memory usage. While all the rest of the state would still be per-process and safe.

tl;dr super share the existing shared memory area with kernel patch

All operating systems not supporting it would keep working as is.

PostgreSQL reconsiders its process-based model

Posted Jun 20, 2023 21:19 UTC (Tue) by andresfreund (subscriber, #69562) [Link]

> If TLB overhead with shared memory and locks between co-operting processes is too high, why not try to fix it in kernel?

I think it's not really an OS issue, but a hardware one. To avoid having to flush the TLB during context switches linux uses PCIDs on x86-64. During context switches the current the current logical cpu's pcid is updated to the the PCID of the relevant process. But a logical CPU just has a single "active" PCID. I think it's similar on ARM.

But this is a bit outside the area I normally dabble in, so I might be misunderstanding. Or just not know about some newer hardware features linux could utilize.

> For example, would something like opt-in sharing of pages between processes that oracle has been trying to get into kernel be the correct option: https://lwn.net/ml/linux-kernel/cover.1682453344.git.khal...

It'd be nice to have that, to save memory on redundant page table entries for the range of mappings that is going to be the between all the processes. But I don't think it'd meaningfully improve the TLB hit rate.


Copyright © 2023, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds