Posted May 26, 2010 9:34 UTC (Wed) by ms (subscriber, #41272)
In reply to: VoltDB launches by intgr
Parent article: VoltDB launches
Err well, if you have 20GB of RAM in your machine, and a similarly large dataset, and a few bytes of that dataset changes, but then you go past the timeout and now a new snapshot gets written, that's an awful lot of data you're rewriting for no reason at all.
Posted May 26, 2010 10:34 UTC (Wed) by intgr (subscriber, #39733)
[Link]
I think you have missed the point of in-memory databases. If your application changes a few bytes in a 20GB dataset every so often, then by all means, do use disk databases.
In-memory databases are used in workloads where you need to have deterministic latency, or where write I/O throughput is so high that incurring a disk seek for every change is no longer practical. There is a long way you can go with write buffering, battery-backed RAID caches and high-end storage devices, but for many applications the tradeoffs of in-memory databases are favorable. Losing last 5 minutes worth of changes in the extremely unlikely event of a power failure can be acceptable.
Also consider that writing down 20GB of data even on the *cheapest* 7200RPM SATA disks takes 5 minutes at most (usually sequential throughput exceeds 100 MB/s). But if you have 20 GB of RAM in your servers then you can probably afford much better storage.
VoltDB launches
Posted May 26, 2010 19:35 UTC (Wed) by ms (subscriber, #41272)
[Link]
The nature of queries is, in general, not uniformly distributed. You may well have a 20GB dataset, but let's suppose the queries you're doing on it vary with which part of the world is awake at the time. Let's also say that you've decided that a RDBMS isn't going to cut it for you, for whatever reason.
Clearly, arbitrarily rewriting the entire dataset over and over again is a waste of time. And please don't forget: a) in the cloud, it's very easy to get lots of RAM. It's quite hard to get fast HDD access; and b) it's very likely that cloud providers are not only going to charge per CPU time and network transfer, but also by storage transfer. Now yes, *I've* just brought in the added complication of the cloud, but that's where I see things going - the flexibility and ease of billing are very attractive to sysadmins and COOs alike.