User: Password:
|
|
Subscribe / Log in / New account

And this is relevant to ext4... exactly how?

And this is relevant to ext4... exactly how?

Posted Nov 30, 2011 21:33 UTC (Wed) by khim (subscriber, #9252)
In reply to: Improving ext4: bigalloc, inline data, and metadata checksums by walex
Parent article: Improving ext4: bigalloc, inline data, and metadata checksums

Take a look here. Note the linux version number...


(Log in to post comments)

And this is relevant to ext4... exactly how?

Posted Nov 30, 2011 23:16 UTC (Wed) by Lennie (guest, #49641) [Link]

Google stores it's data on ext4 without journal:

http://www.youtube.com/watch?v=Wp5Ehw7ByuU

And this is relevant to ext4... exactly how?

Posted Dec 1, 2011 1:01 UTC (Thu) by SLi (subscriber, #53131) [Link]

Then again Google normally has three copies of every piece of important data on different computers, so they're not too concerned about failures due to not journaling.

And this is relevant to ext4... exactly how?

Posted Dec 1, 2011 1:59 UTC (Thu) by dlang (subscriber, #313) [Link]

journaling (as used by default on every distro I know) almost never prevents data loss, at least not directly. All that journaling does is make it so that the filesystem metadata makes sense, the metadata may be pointing at garbage data, but you aren't as likely to get the metadata corrupted in such a way that continues use of the filesystem after a failure will corrupt existing data.

And this is relevant to ext4... exactly how?

Posted Dec 1, 2011 3:29 UTC (Thu) by tytso (✭ supporter ✭, #9993) [Link]

fsync() in combination with a journal will protect against data loss.

But yes, a journal by itself has as its primary feature avoiding long fsck times. One nice thing with ext4 is that fsck times are reduced (typically) by a factor of 7-12 times. So a TB file system that previously took 20-25 minutes might now only take 2-3 minutes.

If you are replicating your data anyway because you're using a cluster file system such as Hadoopfs, and you're confident that your data center has appropriate contingencies that mitigate against a simultaneous data-center wide power loss event (i.e., you have bat, and diesel generators, etc., and you test all of this equipment regularly), then it may be that going without a journal makes sense. You really need to know what you are doing though, and it requires careful design both at the hardware level, the data center level, as well as the storage stack above the local disk file system.

And this is relevant to ext4... exactly how?

Posted Dec 2, 2011 18:55 UTC (Fri) by walex (guest, #69836) [Link]

One nice thing with ext4 is that fsck times are reduced (typically) by a factor of 7-12 times. So a TB file system that previously took 20-25 minutes might now only take 2-3 minutes.

That is the case only for fully undamaged filesystems, that is the common case of a periodic filesystem check. I have never seen any reports that the new 'e2fsck' is faster on damaged filesystems too. And since a damaged 1.5TB 'ext3' filesystem was reported take 2 months to 'fsck', even a factor of 10 is not going to help a lot.

And this is relevant to ext4... exactly how?

Posted Dec 2, 2011 19:10 UTC (Fri) by dlang (subscriber, #313) [Link]

I've had to do fsck on multi-TB filesystems after unclean shutdowns, they can take a long time, but time measured in hours (to a couple days for the larger ones). I suspect that if you are taking months, you have some other bottleneck in place as well.

And this is relevant to ext4... exactly how?

Posted Dec 3, 2011 0:40 UTC (Sat) by walex (guest, #69836) [Link]

An unclean shutdown is usually not that damaged, which can however happen with a particularly bad unclean shutdown (lots of stuff in flight, for example on a wide RAID) or RAM/disk errors. The report I saw was not for a "enterprise" system with battery, ECC and a redundant storage layer.

And this is relevant to ext4... exactly how?

Posted Dec 2, 2011 21:41 UTC (Fri) by nix (subscriber, #2304) [Link]

This has been wrong for years. As long as your filesystem was built with the uninit_bg option (which it is by default), block groups which have never been used will not need to be fscked either, hugely speeding up passes 2 and 5 (at the very least).

Fill up the fs, even once, and this benefit goes away -- but a *lot* of filesystems sit for years mostly empty. fscking those filesystems is very, very fast these days (I've seen subsecond times for mostly-empty multi-Tb filesystems).

And this is relevant to ext4... exactly how?

Posted Dec 2, 2011 22:45 UTC (Fri) by tytso (✭ supporter ✭, #9993) [Link]

We could fix things so that as you delete files from a full file system, we reduce the high watermark field for each block group's inode table, which would restore the speedups caused by needing to scan the entire inode table. I haven't bothered to do this, but I'll add it to my todo list. (Or someone can send me a patch; it would be trivial to do this at e2fsck, but we could do it in the kernel, too.)

Not all of the improvements in fsck time come from being able to skip reading portions of the inode table. Extent tree blocks are also far more efficient than indirect blocks, and so that contributes to much of the speed improvements of fsck'ing an ext4 filesystem compared to an ext2 or ext3 file system.

And this is relevant to ext4... exactly how?

Posted Dec 2, 2011 23:35 UTC (Fri) by nix (subscriber, #2304) [Link]

We could fix things so that as you delete files from a full file system, we reduce the high watermark field for each block group's inode table
That seems hard to me. It's easy to tell if you need to increase the high watermark when adding a new file: but when you delete one, how can you tell what to reduce the high watermark to without doing a fairly expensive scan?


Copyright © 2018, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds