PostgreSQL pain points
PostgreSQL pain points
Posted Mar 26, 2014 20:37 UTC (Wed) by jberkus (guest, #55561)In reply to: PostgreSQL pain points by marcH
Parent article: PostgreSQL pain points
* Database blocks become difficult to resize and move.
* Can no longer use standard tools like "rsync" with database files.
* The database project now needs a staff to maintain what's basically their own filesystem
* Can't keep up with hardware advances in a timely fashion.
* Throwing away all of the good stuff developed by Linux IO and FS geeks over the last 20 years.
* Clobbering all other IO-using software on the same machine.
For Postgres, raw partitions aren't even reasonable to contemplate since we'd need to add 5-10 full-time hackers to the community just to build and maintain that portion of the code.
Posted Mar 26, 2014 22:52 UTC (Wed)
by marcH (subscriber, #57642)
[Link] (6 responses)
Yes, agreed totally.
> Throwing away all of the good stuff developed by Linux IO and FS geeks over the last 20 years.
I understand all the "it's more [development] work" arguments that you put in one form or the other. Yes for sure it is: exactly like the duplication of effort we have in the variety of filesytems (on various operating systems) that we have out there. Some better at some loads and others at others.
> * Database blocks become difficult to resize and move.
Yes, "virtualization"/layering has pros and cons. But if you really want "bare-metal" performance you know where you have to go.
> * Can no longer use standard tools like "rsync" with database files.
Well, you can't use that on a live database anyway, so this point looks moot. Unless maybe you rely on a filesystem with snapshotting which is... not far from duplicating a database feature! Same pattern gain.
> * Can't keep up with hardware advances in a timely fashion.
Sorry I don't get these two. Care to elaborate?
Posted Mar 27, 2014 18:58 UTC (Thu)
by kleptog (subscriber, #1183)
[Link] (5 responses)
> Well, you can't use that on a live database anyway, so this point looks moot. Unless maybe you rely on a filesystem with snapshotting which is... not far from duplicating a database feature! Same pattern gain.
You can. It's useful for both backup and replication. Basically you can use rsync to quickly update your backup image. And then you take a copy of the WAL logs. The combination gives you a backup. If you have a snapshotting filesystem you can indeed achieve similar effects.
> > * Can't keep up with hardware advances in a timely fashion.
> Sorry I don't get these two. Care to elaborate
For the first, consider the effects the rise of SSD is having on the Linux VFS. That would need to be replicated in the database. For the second, as a userspace program you don't have a good view of what the rest of the system is doing, hence you might be interfering with other processes. The kernel has the overview.
It's a feature that a database doesn't assume it's the only program on a machine.
Posted Mar 27, 2014 19:32 UTC (Thu)
by marcH (subscriber, #57642)
[Link] (2 responses)
> For the second, as a userspace program you don't have a good view of what the rest of the system is doing, hence you might be interfering with other processes. The kernel has the overview.
How is the raw partition approach worse here? I would intuitively think it makes things better: less sharing.
Anyway: any database of serious size runs on dedicated or practically dedicated hardware, doesn't it?
Posted Mar 28, 2014 22:24 UTC (Fri)
by kleptog (subscriber, #1183)
[Link]
> > For the second, as a userspace program you don't have a good view of what the rest of the system is doing, hence you might be interfering with other processes. The kernel has the overview.
> How is the raw partition approach worse here? I would intuitively think it makes things better: less sharing.
I think it depends on what your goals are. If your goal is to make the absolutely fastest database server possible, then you'd probably want to use raw access on a system with nothing else running.
If your goal is to make a database server that is broadly useful, runs efficiently on a wide variety of systems then asking the kernel to do its job is the better idea.
PostgreSQL tends to the latter. The gains you can get from raw access are simply not worth the effort and would make PostgreSQL much harder to deploy in many situations. A database server that only works well when it's got the machine to itself is a PITA in many situations.
Posted Mar 29, 2014 3:37 UTC (Sat)
by fandingo (guest, #67019)
[Link]
I'm not sure a database should be implementing operations necessary for ATA TRIM.
Posted Mar 27, 2014 19:36 UTC (Thu)
by marcH (subscriber, #57642)
[Link] (1 responses)
Heh, that was missing.
I am still not convinced that rsync is the ultimate database backup tool. As much as I love rsync it surely does not have the patented exclusivity of incremental copying/backup techniques.
Posted Apr 14, 2014 7:41 UTC (Mon)
by MortenSickel (subscriber, #3238)
[Link]
On the other hand, for any database of a certain size and importance, you probably want to have a separate partition for the database files so I could be possible to advice using a certain file system with some certain parameters to get optimal performance.
Posted Mar 27, 2014 3:34 UTC (Thu)
by zblaxell (subscriber, #26385)
[Link] (2 responses)
LVM.
> Can't keep up with hardware advances in a timely fashion.
Most of that happens below the block device level, so filesystems and raw partitions get it at the same time.
> Can no longer use standard tools like "rsync" with database files.
Databases tend to have their own. You often can't use rsync with a live database file on a filesystem either.
> The database project now needs a staff to maintain what's basically their own filesystem
That private filesystem doesn't have to do much that the database wasn't doing already. You could skip an indirection layer.
Unlike a filesystem, a database is not required to support legacy on-disk data across major releases (your DBA must replicate, dump/restore, or in-place upgrade instead). This means the private database filesystem could adapt more quickly to changes in storage technology compared to a kernel filesystem.
> Throwing away all of the good stuff developed by Linux IO and FS geeks over the last 20 years.
You are assuming that FS geeks are developing stuff that is relevant for databases. A database might be better off freed from the constraints of living with a filesystem layer (and legacy filesystem feature costs) between it and its storage.
OTOH a filesystem might be better after all--but that has to be proven, not assumed.
> Clobbering all other IO-using software on the same machine.
That's also true in the filesystem case.
Posted Mar 27, 2014 6:22 UTC (Thu)
by amacater (subscriber, #790)
[Link] (1 responses)
In a slightly different context - IBM Clearcase did/does something similar.
Result: everyone's worst nightmare if a large disk fails - IBM _might_ be able to recover your life's work if you can send them the entire filesystem.
And yes, dirty pages and flushing are fun :(
Posted Mar 27, 2014 8:00 UTC (Thu)
by marcH (subscriber, #57642)
[Link]
About ClearCase: anyone who has used it (and used other things) knows it was one of the worst pieces of engineering ever. So, if you want to be convincing I suggest not using it as an example in any point you are trying to make.
Posted Mar 28, 2014 1:02 UTC (Fri)
by rodgerd (guest, #58896)
[Link] (2 responses)
Probably the most annoying experience was discovering that ASM doesn't do any kind of sanity check when starting filesystems: after a SAN change, the SAN operator wired some LUNs back to the wrong boxes. With an AAAABAAAA on one server and BBBBABBBB on another, LVM would have simply refused to start the volume group. ASM started and then Oracle would silently coredump every time it tried to work with data on the misplaced disk. Such as the horrors of re-inventing decades of volume management and filesystems.
Posted Mar 28, 2014 8:31 UTC (Fri)
by marcH (subscriber, #57642)
[Link] (1 responses)
Wild, poorly educated guess: in an ideal world, shouldn't databases be hosted on a trimmed down, "semi-filesystem" which has only management features and none of the duplicated performance stuff which gets in the way? It could be called say, LLVM++ for instance?
Posted Mar 29, 2014 13:56 UTC (Sat)
by kleptog (subscriber, #1183)
[Link]
I wonder if finding a way to expose parts of the JBD (Journalling Block Device) to user space might help with any of this.
virtualization
> * Clobbering all other IO-using software on the same machine.
virtualization
> > * Clobbering all other IO-using software on the same machine.
virtualization
virtualization
virtualization
virtualization
virtualization
So, no raw partitions, please - unless rsync and other file management tools get patched to read them... :-P
PostgreSQL pain points
PostgreSQL pain points
Softtware snapshotting and versioning by intercepting file system calls and writing to a custom intermediate file system level.
PostgreSQL pain points
PostgreSQL pain points
PostgreSQL pain points
PostgreSQL pain points