User: Password:
|
|
Subscribe / Log in / New account

virtualization

virtualization

Posted Mar 26, 2014 22:52 UTC (Wed) by marcH (subscriber, #57642)
In reply to: PostgreSQL pain points by jberkus
Parent article: PostgreSQL pain points

> * The database project now needs a staff to maintain what's basically their own filesystem

Yes, agreed totally.

> Throwing away all of the good stuff developed by Linux IO and FS geeks over the last 20 years.

I understand all the "it's more [development] work" arguments that you put in one form or the other. Yes for sure it is: exactly like the duplication of effort we have in the variety of filesytems (on various operating systems) that we have out there. Some better at some loads and others at others.

> * Database blocks become difficult to resize and move.

Yes, "virtualization"/layering has pros and cons. But if you really want "bare-metal" performance you know where you have to go.

> * Can no longer use standard tools like "rsync" with database files.

Well, you can't use that on a live database anyway, so this point looks moot. Unless maybe you rely on a filesystem with snapshotting which is... not far from duplicating a database feature! Same pattern gain.

> * Can't keep up with hardware advances in a timely fashion.
> * Clobbering all other IO-using software on the same machine.

Sorry I don't get these two. Care to elaborate?


(Log in to post comments)

virtualization

Posted Mar 27, 2014 18:58 UTC (Thu) by kleptog (subscriber, #1183) [Link]

> > * Can no longer use standard tools like "rsync" with database files.

> Well, you can't use that on a live database anyway, so this point looks moot. Unless maybe you rely on a filesystem with snapshotting which is... not far from duplicating a database feature! Same pattern gain.

You can. It's useful for both backup and replication. Basically you can use rsync to quickly update your backup image. And then you take a copy of the WAL logs. The combination gives you a backup. If you have a snapshotting filesystem you can indeed achieve similar effects.

> > * Can't keep up with hardware advances in a timely fashion.
> > * Clobbering all other IO-using software on the same machine.

> Sorry I don't get these two. Care to elaborate

For the first, consider the effects the rise of SSD is having on the Linux VFS. That would need to be replicated in the database. For the second, as a userspace program you don't have a good view of what the rest of the system is doing, hence you might be interfering with other processes. The kernel has the overview.

It's a feature that a database doesn't assume it's the only program on a machine.

virtualization

Posted Mar 27, 2014 19:32 UTC (Thu) by marcH (subscriber, #57642) [Link]

> > * Clobbering all other IO-using software on the same machine.

> For the second, as a userspace program you don't have a good view of what the rest of the system is doing, hence you might be interfering with other processes. The kernel has the overview.

How is the raw partition approach worse here? I would intuitively think it makes things better: less sharing.

Anyway: any database of serious size runs on dedicated or practically dedicated hardware, doesn't it?

virtualization

Posted Mar 28, 2014 22:24 UTC (Fri) by kleptog (subscriber, #1183) [Link]

> > > * Clobbering all other IO-using software on the same machine.

> > For the second, as a userspace program you don't have a good view of what the rest of the system is doing, hence you might be interfering with other processes. The kernel has the overview.

> How is the raw partition approach worse here? I would intuitively think it makes things better: less sharing.

I think it depends on what your goals are. If your goal is to make the absolutely fastest database server possible, then you'd probably want to use raw access on a system with nothing else running.

If your goal is to make a database server that is broadly useful, runs efficiently on a wide variety of systems then asking the kernel to do its job is the better idea.

PostgreSQL tends to the latter. The gains you can get from raw access are simply not worth the effort and would make PostgreSQL much harder to deploy in many situations. A database server that only works well when it's got the machine to itself is a PITA in many situations.

virtualization

Posted Mar 29, 2014 3:37 UTC (Sat) by fandingo (subscriber, #67019) [Link]

> How is the raw partition approach worse here?

I'm not sure a database should be implementing operations necessary for ATA TRIM.

virtualization

Posted Mar 27, 2014 19:36 UTC (Thu) by marcH (subscriber, #57642) [Link]

> The combination gives you a backup

Heh, that was missing.

I am still not convinced that rsync is the ultimate database backup tool. As much as I love rsync it surely does not have the patented exclusivity of incremental copying/backup techniques.

virtualization

Posted Apr 14, 2014 7:41 UTC (Mon) by MortenSickel (subscriber, #3238) [Link]

Then I think you have not looked well enough into it. As was mentioned earlier, rsync of the database files in combination with the wal logs gives you a simple backup that is immideately usable. At my earlier job, we were heavy users of postgres and used that as our main backup system. (It also makes setting up replication a snap)
So, no raw partitions, please - unless rsync and other file management tools get patched to read them... :-P

On the other hand, for any database of a certain size and importance, you probably want to have a separate partition for the database files so I could be possible to advice using a certain file system with some certain parameters to get optimal performance.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds