LWN.net Logo

VoltDB launches

VoltDB launches

Posted May 26, 2010 9:34 UTC (Wed) by ms (subscriber, #41272)
In reply to: VoltDB launches by intgr
Parent article: VoltDB launches

Err well, if you have 20GB of RAM in your machine, and a similarly large dataset, and a few bytes of that dataset changes, but then you go past the timeout and now a new snapshot gets written, that's an awful lot of data you're rewriting for no reason at all.


(Log in to post comments)

VoltDB launches

Posted May 26, 2010 10:34 UTC (Wed) by intgr (subscriber, #39733) [Link]

I think you have missed the point of in-memory databases. If your application changes a few bytes in a 20GB dataset every so often, then by all means, do use disk databases.

In-memory databases are used in workloads where you need to have deterministic latency, or where write I/O throughput is so high that incurring a disk seek for every change is no longer practical. There is a long way you can go with write buffering, battery-backed RAID caches and high-end storage devices, but for many applications the tradeoffs of in-memory databases are favorable. Losing last 5 minutes worth of changes in the extremely unlikely event of a power failure can be acceptable.

Also consider that writing down 20GB of data even on the *cheapest* 7200RPM SATA disks takes 5 minutes at most (usually sequential throughput exceeds 100 MB/s). But if you have 20 GB of RAM in your servers then you can probably afford much better storage.

VoltDB launches

Posted May 26, 2010 19:35 UTC (Wed) by ms (subscriber, #41272) [Link]

The nature of queries is, in general, not uniformly distributed. You may well have a 20GB dataset, but let's suppose the queries you're doing on it vary with which part of the world is awake at the time. Let's also say that you've decided that a RDBMS isn't going to cut it for you, for whatever reason.

Clearly, arbitrarily rewriting the entire dataset over and over again is a waste of time. And please don't forget: a) in the cloud, it's very easy to get lots of RAM. It's quite hard to get fast HDD access; and b) it's very likely that cloud providers are not only going to charge per CPU time and network transfer, but also by storage transfer. Now yes, *I've* just brought in the added complication of the cloud, but that's where I see things going - the flexibility and ease of billing are very attractive to sysadmins and COOs alike.

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds