A hash-based DOS attack on Btrfs

Posted Dec 13, 2012 20:36 UTC (Thu) by masoncl (subscriber, #47138)
Parent article: A hash-based DoS attack on Btrfs

The broader issue of collisions against crc32c was known when we selected the hash. At the end of the day, I think that if someone has write permissions to your directory they can prevent creation of arbitrary files.

But the looping is a real problem, and Pascal did a great job writing it up.

-chris

A hash-based DOS attack on Btrfs

Posted Dec 13, 2012 21:46 UTC (Thu) by rahulsundaram (subscriber, #21946) [Link] (23 responses)

any thoughts of using siphash as suggested in the blog?

A hash-based DOS attack on Btrfs

Posted Dec 13, 2012 22:14 UTC (Thu) by masoncl (subscriber, #47138) [Link] (22 responses)

Without the looping, this boils down to one user with write permissions in a directory being able to prevent the creation of new files. There are lots of ways a user might do that.

I'm certainly happy to add a second hash function if there is a strong case for it, but so far I'm not convinced ;)

-chris

A hash-based DOS attack on Btrfs

Posted Dec 13, 2012 22:52 UTC (Thu) by therealmik (guest, #87720) [Link] (20 responses)

That's a cop-out... There are plenty of cases where you'd want users to be able to choose filenames but not be able to DoS administrators (or scheduled tasks) that run over those directories.

While I guess server software, such as web servers, samba, etc could always store a random filename then create an index for it in a database, that's exactly what a directory is supposed to be doing anyway - and the database probably has the same vulnerability in whatever hash it uses.

So assuming that you come around to the fact that filename hash collisions are a bad thing, it leads us to:

- Directories probably need a random hash seed for their index (one for the whole FS probably won't cut it)
- crc is not an appropriate hash, as even with a random seed an attacker can still cause collisions

A hash-based DOS attack on Btrfs

Posted Dec 13, 2012 23:06 UTC (Thu) by masoncl (subscriber, #47138) [Link] (19 responses)

So, what you're describing is a secure system where:

1) Users share writable directories (either with each other or with admins)
2) Blocking the creation of specific filenames causes problems
3) Blocking the creation of those filenames by creating a file with that name is not an issue

Right?

File uploads

Posted Dec 13, 2012 23:20 UTC (Thu) by man_ls (guest, #15091) [Link]

You are describing basically a server with file upload enabled, for example to upload user avatars. If the file already exists, just append .1 at the end and use that modified name as the key instead.

I would discard the original name and use a random filename, unless for some reason the original name was valuable for anything (e.g. for later user selection). Such use cases do exist, and besides there are many software packages that work this way. That using a new shiny filesystem would automatically convert them from ugly software to DoS targets is not admissible IMO.

A hash-based DOS attack on Btrfs

Posted Dec 14, 2012 0:20 UTC (Fri) by cyanit (guest, #86671) [Link] (17 responses)

The most obvious way of implementing a file hosting service consists in creating a directory named after the user-chosen username or e-mail address, and putting files there with the user-chosen filename... oops...

In addition to SipHash, taking the latest 64 bits of AES-CBC ciphertext of the zero-terminated filename should work and might be faster if the CPU can accelerate AES.

A hash-based DOS attack on Btrfs

Posted Dec 14, 2012 0:48 UTC (Fri) by wahern (subscriber, #37304) [Link] (16 responses)

Doubtful. The key setup would big the biggest bottleneck.

Siphash was designed for this exact scenario. Dan Bernstein isn't known for his lack of performance insights. The paper has gobs of analysis of performance characteristics.

Using a CRC as a hash is just stupid. And I say that as someone who learned that the hard way. People only used it because of the lack of a decent alternative--newer hashes like murmur, etc, didn't have enough proven guarantees to inspire confidence. Siphash was designed to be _the_ go to hash for data structures, with both decent performance and a rigorous security analysis.

A hash-based DOS attack on Btrfs

Posted Dec 14, 2012 2:06 UTC (Fri) by masoncl (subscriber, #47138) [Link] (7 responses)

Going back to the original example, which programs include all three of my conditions? The one example listed here already has a collision scheme included, so the program in theory can get around it.

This isn't a new issue, reiserfs had a similar hashing scheme. Other hashes are different, but crc32c is implemented in hardware on newer CPUs, so there is a real performance benefit here.

-chris

A hash-based DOS attack on Btrfs

Posted Dec 14, 2012 2:21 UTC (Fri) by therealmik (guest, #87720) [Link] (6 responses)

So the obvious (although contrived) example, would be a CMS where the username is prefixed to the filename (rather than put in a subdirectory).

Another example would be somebody just wanting to hog a lot of in-kernel CPU time, and causing conflicts with their own filename, even though they're rate-limited by higher-level schemes (eg. server-level rate-limits).

Creating a conflict with a lock file created in the same directory is probably another example.

As a previous poster alluded, crc32c is good for detecting random corruption, not for hash tables or malicious corruption.

The CPU implementation of crc32c argument is also deeply flawed - extra tree walks are not worth the few cycles saved, as tree walks are vastly more expensive.

Remember that hash tables are only O(1) in a "perfect hash" scenario, so the hash table performance approaches what you'd expect from O(n) as the hash gets worse.

A hash-based DOS attack on Btrfs

Posted Dec 14, 2012 3:30 UTC (Fri) by davidescott (guest, #58580) [Link] (5 responses)

I don't think Chris Mason really needs to be reminded how hash tables work. I'll trust he is up on the theory.

Chris (if you are still monitoring the thread),
How would one go about strengthening the FS and making it more resistant to these kinds of attacks? I gather that BTRFS is capable of supporting other hashes -- what are the various options down the road? It it an option at filesystem creation time to use a more collision resistant hash? How big a hash can one use? If one had a hardware SHA-512 and wanted to waste the disk space could they? Are there ways to salt hashes using something secret in the directory structure? Could the Hash be replaced with red-black trees etc?

Given the current on disk layout where would the introduction of protections against this kind of attack introduce btrfs2?

A hash-based DOS attack on Btrfs

Posted Dec 14, 2012 12:35 UTC (Fri) by masoncl (subscriber, #47138) [Link] (4 responses)

We can easily add a different hash to the directory format. There are 64 bits available for the hash, although you're limited to 32 bits on 32 bit userland thanks to the readdir/seekdir/telldir interfaces.

Salting etc are also more than possible. I personally don't think the hash makes the shared directories in any way secure, but I've been wrong many times before ;)

A hash-based DOS attack on Btrfs

Posted Dec 14, 2012 16:09 UTC (Fri) by cesarb (subscriber, #6266) [Link]

I believe the ext* family uses salting (the "Directory Hash Seed" field you can see in dumpe2fs). This should be enough to make the hash unpredictable as long as the attacker does not know the seed, and the only way to know the seed should be having raw block access to the partition and reading the superblock.

But if the hash output is visible via telldir, the attacker can still find out the salt unless you are using strong crypto, which would not be very good for performance. I think the hash output is not visible via telldir on the ext* family.

You could play games like having a per-directory randomly chosen salt, but if you are getting to that point IMHO it is better to work on making the algorithms robust against degenerate edge cases.

A hash-based DOS attack on Btrfs

Posted Dec 14, 2012 17:01 UTC (Fri) by davidescott (guest, #58580) [Link] (2 responses)

It might not be a bad idea to salt, even if the salt can be determined through telldir. As you pointed out anyone running local code has much more obvious methods of attack (touch "filename" instead of touch "hash_collision_with_filename"), and a per directory salt would prevent most of the remote attacks suggested in the comments.

Hopefully nobody exports a "tell/read/seekdir" interface over php.

A hash-based DOS attack on Btrfs

Posted Dec 14, 2012 17:35 UTC (Fri) by dakas (guest, #88146) [Link] (1 responses)

Uh, with a CRC salting a whole directory with the same salt will cause exactly the same collisions as without salt. The salt will just cause the final CRC to be XORed with a constant.

A CRC is not a cryptographic hash. Salting is useless, and identical prefixes have identical impact on the result.

A hash-based DOS attack on Btrfs

Posted Dec 14, 2012 21:56 UTC (Fri) by masoncl (subscriber, #47138) [Link]

I should have made it clear that salting only makes a difference if we add an optional second hash.

A hash-based DOS attack on Btrfs

Posted Dec 14, 2012 2:13 UTC (Fri) by cyanit (guest, #86671) [Link] (7 responses)

But the key setup only needs to be done at filesystem mount time; that said, I have no idea how AES-NI accelerated AES-CBC compares to SipHash in practice.

A hash-based DOS attack on Btrfs

Posted Dec 14, 2012 2:54 UTC (Fri) by cyanit (guest, #86671) [Link] (6 responses)

So, AES-128 takes 80 cycles of latency on Sandy Bridge with AES-NI for each 16 byte block vs SipHash's claim of 134 cycles per 16 bytes on the best 64-bit CPU they give data for.

However, if other instructions can be scheduled by the out-of-order CPU in the meantime, then AES should be much faster, since the reciprocal throughput is actually 10 cycles on Sandy Bridge and should only take up one execution unit.

Without AES instructions things are most likely reversed.

A hash-based DOS attack on Btrfs

Posted Dec 14, 2012 11:23 UTC (Fri) by daniel (guest, #3181) [Link] (4 responses)

A few tens of nanoseconds in either case. That is spectacularly good performance. I'll take a hash designed by a hash expert any day over a cheap hack or something I rolled myself that seems to pass my tests. Been there, wish I'd had Siphash then. Using CRC for the hash was just a bad idea, Chris must have been having a bad day that day. Best time to fix it is now, it only gets harder later.

A hash-based DOS attack on Btrfs

Posted Dec 14, 2012 15:27 UTC (Fri) by drag (guest, #31333) [Link] (3 responses)

BTRFS folks responded that supporting other hashing algorithms is on their list of patches to apply.

A hash-based DOS attack on Btrfs

Posted Dec 15, 2012 2:11 UTC (Sat) by daniel (guest, #3181) [Link] (2 responses)

Anyway, it is apparent that issue is not merely a linear increase in directory operation time, but a for-real bug that looks like some kind of live-lock. The hash can and should be fixed, because although the attack in question turned out to achieve something other than its original intent, there is no question that trivially being able to force hash collisions is a vulnerability.

The most important result here is not the inappropriateness of CRC as a hash, but that Btrfs is a complex beast, still in development, a critical bug was just exposed, and likely a few more remain. Given wide enough testing and continued developer commitment it will become stable, but today it is not. From where I sit, it looks like Ext4 will be wearing its standard Linux filesystem crown for some time yet.

A hash-based DOS attack on Btrfs

Posted Dec 15, 2012 6:17 UTC (Sat) by SEJeff (guest, #51588) [Link]

Unless tux3 matured first *poke*

A hash-based DOS attack on Btrfs

Posted Dec 15, 2012 13:18 UTC (Sat) by vonbrand (subscriber, #4458) [Link]

From a cursory look over, I'd say 220 *minutes* isn't just a bad hash degenerating to a linear search...

A hash-based DOS attack on Btrfs

Posted Dec 15, 2012 2:46 UTC (Sat) by wahern (subscriber, #37304) [Link]

But AES-NI isn't the complete algorithm. Siphash numbers are for the complete operation, including compression and output. You could probably cobble something basic together with the internal AES mixer (what AES-NI optimizes), but then you've just invented your own algorithm with unknown characteristics, all in an attempt to gain negligible performance improvements. That sort of premature optimization is what causes these sorts of bugs in the first place.

A hash-based DOS attack on Btrfs

Posted Dec 14, 2012 10:30 UTC (Fri) by robert_s (subscriber, #42402) [Link]

Please see my above comment about zipfiles

A hash-based DOS attack on Btrfs

Posted Dec 15, 2012 8:19 UTC (Sat) by eupator (guest, #44581) [Link] (1 responses)

> The broader issue of collisions against crc32c was known when we selected
> the hash. At the end of the day, I think that if someone has write
> permissions to your directory they can prevent creation of arbitrary files.

The problem is far from being limited to such case. Besides the already mentioned unpacking of archives, mirroring remote directories is very common - consider rsync, recursive wget, or version control systems.

A hash-based DOS attack on Btrfs

Posted Dec 15, 2012 17:49 UTC (Sat) by Karellen (subscriber, #67644) [Link]

Hmmm.....version control systems. That provokes an interesting thought.

Create a Linux patch series, preferably btrfs-related patches for maximum lulz, whose git SHA-1 ids (any blob, tree, or commit ids should be fine) end up generating filenames in the same object-subdirectory with CRC32 collisions, and send a pull request to lkml.

Hilarity ensues!

The object-subdirectory requirement means there's a 40-bit keyspace to search, rather than 32-bit, but one could probably find enough whitespace/punctuation entropy in comments to generate enough collisions to produce at least some effect, even if you can't find enough to lock up a system.

Also, AIUI, pull requests are unconventional and might be ignored, but if you send the patch series directly to the list, the implicit rebasing will destroy the object ids (and therefore the collisions) - unless all the collisions are in file blobs and none of the files have been touched before your patches are applied.

Anyone any idea how plausible/expensive that attack would be?