Optimizing stable pages
Optimizing stable pages
Posted Dec 6, 2012 19:56 UTC (Thu) by dlang (guest, #313)In reply to: Optimizing stable pages by Jonno
Parent article: Optimizing stable pages
one prime example would be the temporary storage on Amazon Cloud machines. If the system crashes, all the data disappears, so there's no value in having a journaling filesystem, and in many cases ext3 and ext4 can have significant overhead compared to ext2
Posted Dec 6, 2012 20:58 UTC (Thu)
by bjencks (subscriber, #80303)
[Link] (1 responses)
Posted Dec 6, 2012 21:18 UTC (Thu)
by dlang (guest, #313)
[Link]
swap has horrible data locality, depending on how things get swapped out a single file could end up scattered all over the disk.
In addition, you approach puts the file storage directly competing with all processes in terms of memory, you may end up swapping out program data because your file storage 'seems' more important.
disk caching has a similar pressure, but the kernel knows that cache data is cache, and that it can therefor be thrown away if needed. tmpfs data isn't in that category.
Posted Dec 6, 2012 22:38 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link]
We've benchmarked it on Amazon EC2 machines. ext4 without journaling is faster than ext2. There are really no more use cases for ext2/3.
Posted Dec 6, 2012 23:06 UTC (Thu)
by andresfreund (subscriber, #69562)
[Link]
Also - I haven't tried this though - shouldn't you be able to create an ext4 without a journal while keeping the other ext4 benefits? According to man tune2fs you can even remove the journal with -O^has_journal from an existing FS. The same is probably true for mkfs.ext4.
Posted Dec 7, 2012 10:22 UTC (Fri)
by cesarb (subscriber, #6266)
[Link]
IIRC, the default Fedora kernel was configured to always use the ext4 code, even when mounting ext2/ext3 filesystems.
Posted Dec 8, 2012 22:39 UTC (Sat)
by man_ls (guest, #15091)
[Link] (14 responses)
Posted Dec 8, 2012 23:28 UTC (Sat)
by dlang (guest, #313)
[Link] (13 responses)
Ignoring what they say and just looking at it from a practical point of view:
The odds of any new EC2 instance you fire up starting on the same hardware, and therefor having access to the data are virtually nonexistant.
If you can't get access to the drive again, journaling is not going to be any good at all.
Add to this the fact that they probably have the hypervisor either do some form of transparent encryption, or make it so that they return all zeros if you read a block you haven't written to yet (to prevent you from seeing someone else's data) and you now have no reason to even try to use a journal on these drives.
Posted Dec 8, 2012 23:56 UTC (Sat)
by man_ls (guest, #15091)
[Link] (11 responses)
In short: any new EC2 instance will of course get a new instance storage, but the same instance will get the same instance storage.
I understand your last paragraph even less. Why do transparent encryption? Just use regular filesystem options (i.e. don't use FALLOC_FL_NO_HIDE_STALE) and you are good. I don't get what a journal has to do with it.
Again, keep in mind that many instance types keep their root filesystem on local instance storage. Would you run / without a journal? I would not.
Posted Dec 9, 2012 1:11 UTC (Sun)
by dlang (guest, #313)
[Link] (10 responses)
I would absolutely run / without a journal if / is on media that I won't be able to access after a shutdown (a ramdisk for example)
I don't remember seeing anything in the AWS management console that would let you reboot an instance, are you talking about rebooting it from inside the instance? If you can do that you don't need a journal because you can still do a clean shutdown. I don't consider the system to have crashed. I count a crash as being when the system stops without being able to do any cleanup (kernel hang or power off on traditional hardware)
Posted Dec 9, 2012 1:18 UTC (Sun)
by man_ls (guest, #15091)
[Link] (9 responses)
The AWS console has an option to reboot a machine, between "Terminate" and "Stop". You can also do it programmatically using EC2 commands, e.g. if the machine stops responding.
Posted Dec 9, 2012 1:50 UTC (Sun)
by dlang (guest, #313)
[Link] (8 responses)
I don't think this is what FALLOC_FL_NO_HIDE_STALE is about. FALLOC_FL_NO_HIDE_STALE is about not zeroing something that this filesystem has not allocated before, but if you have a disk that has a valid ext4 filesystem on it and plug that disk into another computer, you can just read the filesystem.
When you delete a file, the data remains on the disk and root can go access the raw device and read the data that used to be in a file.
by default, when a filesystem allocates a block to a new file, it zeros out the data on that block, it's this step that FALLOC_FL_NO_HIDE_STALE lets you skip.
the If you really had raw access to the local instance storage without the hypervisor doing something, then you could just mount whatever filesystem the person before you left there. To avoid this Amazon would need to wipe the disks, and since it takes a long time to write a TB or so of data (even on SSDs), I'm guessing that they do something much easier, like doing some sort of encryption to make it so that one instance can't see data written by a prior instance.
Posted Dec 9, 2012 12:04 UTC (Sun)
by man_ls (guest, #15091)
[Link] (7 responses)
I don't know how Amazon (or the hypervisor) prevents access to the raw disk, where unallocated sectors might be found and scavenged even if the filesystem is erased. I guess they do something clever or we would have heard about people reading Zynga's customer database from a stale instance.
Posted Dec 9, 2012 16:01 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Amazon doesn't care about your filesystem. AMIs are just dumps of block devices - Amazon simply unpacks them onto a suitable disk. You're free to use any filesystem you want (there might be problems with the bootloader, but they are not insurmountable).
You certainly can access the underlying disk device.
Posted Dec 10, 2012 1:07 UTC (Mon)
by dlang (guest, #313)
[Link] (5 responses)
> I don't know how Amazon (or the hypervisor) prevents access to the raw disk, where unallocated sectors might be found and scavenged even if the filesystem is erased. I guess they do something clever or we would have heard about people reading Zynga's customer database from a stale instance.
This is exactly what I'm talking about.
There are basically three approaches to doing this without the cooperation of the OS running on the instance (which you don't have)
1. the hypervisor zeros out the entire drive before the hardware is considered available again.
2. the hypervisor does encryption of the blocks with a random key for each instance, loose the key and reading the blocks just returns garbage
3. the hypervisor tracks what blocks have been written to and only returns valid data for those blocks.
I would guess #1 or #2, and after thinking about it for a while would not bet either way
#1 is simple, but it takes a while (unless the drive has direct support for trim and effectively implements #3 in the drive, SSDs may do this)
#2 is more expensive, but it allows the system to be re-used faster
Posted Dec 10, 2012 2:55 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link] (4 responses)
Posted Dec 10, 2012 3:27 UTC (Mon)
by dlang (guest, #313)
[Link] (3 responses)
It seems like trying to keep a map of if this block has been written to would be rather expensive to do at the hypervisor level, particularly if you are talking about large drives.
Good to know that you should get zeros for uninitialized sectors.
Posted Dec 10, 2012 3:30 UTC (Mon)
by Cyberax (✭ supporter ✭, #52523)
[Link] (2 responses)
Posted Dec 10, 2012 6:18 UTC (Mon)
by bjencks (subscriber, #80303)
[Link] (1 responses)
It's well documented that fresh EBS volumes keep track of touched blocks; to get full performance on random writes you need to touch every block first. That implies to me that they don't even allocate the block on the back end until it's written to.
Not sure how instance storage initialization works, though.
Posted Dec 10, 2012 6:34 UTC (Mon)
by dlang (guest, #313)
[Link]
As you say, instance local storage is different.
Posted Dec 9, 2012 1:40 UTC (Sun)
by Cyberax (✭ supporter ✭, #52523)
[Link]
It does not survive stopping the instance through the Amazon EC2 API.
Optimizing stable pages
Optimizing stable pages
Optimizing stable pages
Optimizing stable pages
Optimizing stable pages
Optimizing stable pages
one prime example would be the temporary storage on Amazon Cloud machines. If the system crashes, all the data disappears
That is a common misconception, but it is not true. As this Amazon doc explains, data in the local instance storage is not lost on a reboot. Quoting that page:
However, data on instance store volumes is lost under the following circumstances:
So it is not guaranteed but it is not ephemeral either: many instance types actually have their root on an instance store. Amazon teaches you to treat it as ephemeral so that users do not rely on it too much. But using ext2 on it is not a good idea unless it is truly ephemeral.
Optimizing stable pages
I am not sure what "dies" means in this context. If the instance is stopped or terminated, then the instance storage is lost. If the instance is rebooted then the same instance storage is kept. Usually you reboot machines which "die" (i.e. crash or oops), so you don't lose instance storage.
EC2 (local) instance storage
EC2 (local) instance storage
New instances should not see the contents of uninitialized (by them) disk sectors. That is the point of the recent discussion about FALLOC_FL_NO_HIDE_STALE. The kernel will not allow one virtual machine to see the contents of another's disk, or at least that is what I understand.
EC2 (local) instance storage
EC2 (local) instance storage
When Amazon EC2 creates a new instance, it allocates a new instance storage with its own filesystem. This process includes formatting the filesystem, and sometimes copying files from the AMI (image file) to the new filesystem. So any previous filesystems are erased. It is here that zeroing unallocated blocks from the previous filesystem comes into place, which is what FALLOC_FL_NO_HIDE_STALE would mess up.
EC2 (local) instance storage
EC2 (local) instance storage
EC2 (local) instance storage
EC2 (local) instance storage
EC2 (local) instance storage
EC2 (local) instance storage
EC2 (local) instance storage
EC2 (local) instance storage
Optimizing stable pages
