LWN.net Logo

EC2 (local) instance storage

EC2 (local) instance storage

Posted Dec 10, 2012 1:07 UTC (Mon) by dlang (✭ supporter ✭, #313)
In reply to: EC2 (local) instance storage by man_ls
Parent article: Optimizing stable pages

Amazon doesn't put a filesystem on the device, you do.

> I don't know how Amazon (or the hypervisor) prevents access to the raw disk, where unallocated sectors might be found and scavenged even if the filesystem is erased. I guess they do something clever or we would have heard about people reading Zynga's customer database from a stale instance.

This is exactly what I'm talking about.

There are basically three approaches to doing this without the cooperation of the OS running on the instance (which you don't have)

1. the hypervisor zeros out the entire drive before the hardware is considered available again.

2. the hypervisor does encryption of the blocks with a random key for each instance, loose the key and reading the blocks just returns garbage

3. the hypervisor tracks what blocks have been written to and only returns valid data for those blocks.

I would guess #1 or #2, and after thinking about it for a while would not bet either way

#1 is simple, but it takes a while (unless the drive has direct support for trim and effectively implements #3 in the drive, SSDs may do this)

#2 is more expensive, but it allows the system to be re-used faster


(Log in to post comments)

EC2 (local) instance storage

Posted Dec 10, 2012 2:55 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

They are using #3. The raw device reads on initialized areas return zeroes.

EC2 (local) instance storage

Posted Dec 10, 2012 3:27 UTC (Mon) by dlang (✭ supporter ✭, #313) [Link]

that eliminates #2, but it could be #1 or #3

It seems like trying to keep a map of if this block has been written to would be rather expensive to do at the hypervisor level, particularly if you are talking about large drives.

Good to know that you should get zeros for uninitialized sectors.

EC2 (local) instance storage

Posted Dec 10, 2012 3:30 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

#1 is unlikely because local storage is quite large (4Tb on some nodes). It's not hard to keep track of dirtied blocks, they need it to support snapshots on EBS volumes anyway.

EC2 (local) instance storage

Posted Dec 10, 2012 6:18 UTC (Mon) by bjencks (subscriber, #80303) [Link]

Just to be clear, there are two different ways of initializing storage: root filesystems are created from a full disk image that specifies every block, so there are no uninitialized blocks to worry about, while non-root instance storage and fresh EBS volumes are created in a blank state, returning zeros for every block.

It's well documented that fresh EBS volumes keep track of touched blocks; to get full performance on random writes you need to touch every block first. That implies to me that they don't even allocate the block on the back end until it's written to.

Not sure how instance storage initialization works, though.

EC2 (local) instance storage

Posted Dec 10, 2012 6:34 UTC (Mon) by dlang (✭ supporter ✭, #313) [Link]

EBS storage is not simple disks, the size flexibility and performance you can get cannot be supported by providing raw access to drives or drive arrays.

As you say, instance local storage is different.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds