LWN.net Logo

bigalloc

bigalloc

Posted Dec 1, 2011 16:56 UTC (Thu) by tialaramex (subscriber, #21167)
In reply to: bigalloc by cmccabe
Parent article: Improving ext4: bigalloc, inline data, and metadata checksums

On I/O errors: Sure. We even catch some implausible signal like SIGBUS if we write to a previously unallocated block on a full filesystem for example. But in practice other than giving developers like myself a terrible shock when it first happened (a SIGBUS? what the hell did I touch?) such behaviour isn't too troubling for us. If an SSD actually dies, we're out of action for some time no matter what, just as we would be if the RAM failed. We anticipate this happening once in a while, it isn't a reason to give up and go home.

Yes, our locality is fairly poor such that readahead is actively bad news. The data structures which dominate are exactly page-sized. We may end up changing anything from a few bytes to a whole page (and even when we write a whole page we need the old contents to determine the new contents), but the chance we then move on to the linearly next (or previous) page is negligible.

My impression was that readahead would be disabled by suitable incantations of madvise(). Is that wrong? It didn't benchmark as wrong on toy systems, but I would have to check whether we actually re-tested on the big machines.


(Log in to post comments)

bigalloc

Posted Dec 1, 2011 20:36 UTC (Thu) by cmccabe (guest, #60281) [Link]

> We even catch some implausible signal like SIGBUS if we write to
> a previously unallocated block on a full filesystem for example.

If I were you, I'd use posix_fallocate to de-sparsify (manifest?) all of the blocks. Then you don't have unpleasant surprises waiting for you later.

> My impression was that readahead would be disabled by suitable
> incantations of madvise(). Is that wrong? It didn't benchmark as wrong on
> toy systems, but I would have to check whether we actually re-tested on
> the big machines.

I looked at mm/filemap.c and found this:

> static void do_sync_mmap_readahead(...) {
> ...
> if (VM_RandomReadHint(vma))
> return;
> ...
> }

So I'm guessing you're safe with MADV_RANDOM. But it might be wise to check the source of the kernel you're using in case something is different in that version.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds