LWN.net Logo

A hash-based DoS attack on Btrfs

Pascal Junod has disclosed a pair of denial-of-service attacks against the Btrfs filesystem based on hash collisions. "I have created several files with random names in a directory (around 500). The time required to remove them is negligible. Then, I have created the same number of files, but giving them only 55 different crc32c values. The time required to remove them is so large that I was not able to figure it out and killed the process after 220 minutes (!)." This is a local attack only, but administrators of Btrfs-using sites with untrusted users may want to pay attention.
(Log in to post comments)

A hash-based DOS attack on Btrfs

Posted Dec 13, 2012 17:33 UTC (Thu) by cyanit (guest, #86671) [Link]

It's not local if you have a web server storing files with partially user selected names.

A hash-based DOS attack on Btrfs

Posted Dec 13, 2012 20:00 UTC (Thu) by tialaramex (subscriber, #21167) [Link]

It seems like it's a little stronger constraint than "partially user selected". The users need enough knowledge of one part of the filename plus the ability to influence the other part to force the hash. Certainly anybody who hasn't specifically thought about this risk (and why would they?) should re-evaluate either their file naming or their use of btrfs.

Today's web browsers seem to prefer to give e.g. PDFs downloaded for viewing the filename chosen by the provider unless it is taken. That ought to allow you to wreck a vulnerable btrfs filesystem by giving carefully chosen names to a series of apparently interesting PDFs.

A hash-based DOS attack on Btrfs

Posted Dec 14, 2012 17:29 UTC (Fri) by dakas (guest, #88146) [Link]

It seems like it's a little stronger constraint than "partially user selected". The users need enough knowledge of one part of the filename plus the ability to influence the other part to force the hash.
Sure? If several last parts of the filename have the same length and CRC and are combined with the same first part of the filename (even if this first part is unknown), why would the resulting CRC not clash? We are not talking about cryptographic hashes here.

A hash-based DOS attack on Btrfs

Posted Dec 17, 2012 13:15 UTC (Mon) by tialaramex (subscriber, #21167) [Link]

"The name consists of a constant prefix plus my user-selected suffix" would constitute enough information. I was thinking more of cases where the prefix /varies/, e.g. because it's used for content addressing, or to serial number the files, or whatever other scheme is in use.

Most code I've worked on that downloaded arbitrary files and gave them names based on their origin (and thus which could be vulnerable to this attack) prefixed the filenames with some varying code or number such as a node identifier. So a naive "assume constant prefix" wouldn't hurt us, but if you could guess our naming scheme (and if we used btrfs) there's definitely a window.

A hash-based DOS attack on Btrfs

Posted Dec 14, 2012 7:23 UTC (Fri) by eduperez (guest, #11232) [Link]

Is it that common for web applications to store user-supplied files using user-supplied names?

As a web developer, that situation has always seemed very scary to me: too many "what if" conditions that I could miss and leave the server open for attacks. I prefer to store such files using application-generated names, and (if needed) store the user-supplied name in a database.

A hash-based DOS attack on Btrfs

Posted Dec 14, 2012 10:22 UTC (Fri) by robert_s (subscriber, #42402) [Link]

People seem to be overlooking zipfiles/tarballs.

It wouldn't be particularly difficult to trick a user into unpacking an archive that had 500 crazily named files - all of a sudden, they're there on their filesystem.

And I'm sure there are plenty of web services that accept zips of multiple files to save a user having to upload them all individually. Some may unpack them all in-memory, only saving out recognized files somewhere else (probably with a different filename). But I'm sure many unpack them ("blindly") in a temporary directory and deal with them from there.

A hash-based DOS attack on Btrfs

Posted Dec 15, 2012 1:00 UTC (Sat) by naptastic (subscriber, #60139) [Link]

Let's say a malicious user uploads a bunch of these files through a compromised Wordpress or Joomla site. I realize this is a bit of a stretch, given how secure these platforms have historically been, but bear with me. If these files contain code that hackers can use to do... whatever it is they do... wouldn't this make it very difficult to clean the account?

A hash-based DOS attack on Btrfs

Posted Dec 14, 2012 10:23 UTC (Fri) by Wol (guest, #4433) [Link]

It can be a royal nuisance if they don't ...

Photo printers now typically print the filename on the back of the print. If I take a CD in and print from the in-store machine, it has my filename on the back. If I upload them, and get them by Royal Mail or go and collect, it has some random name on the back - and how do I *easily* track down the file it was printed from?

Cheers,
Wol

A hash-based DOS attack on Btrfs

Posted Dec 14, 2012 11:03 UTC (Fri) by hummassa (subscriber, #307) [Link]

That's a non sequitur. Nothing prevents them from storing the original filename along with the photo (but not the photo under the original filename), and from printing the original filename (as opposed to the filename under which the photo is stored in their storage system) in the back of the physical print.

A hash-based DOS attack on Btrfs

Posted Dec 20, 2012 3:53 UTC (Thu) by cibyr (subscriber, #87609) [Link]

If only my operating system had some sort of system for storing the contents of files with their names, then I wouldn't have to keep track of that stuff in a database...

Oh wait, we do have that! It's called a file system! I assume it can efficiently handle arbitrary combinations of file contents and names.

A hash-based DOS attack on Btrfs

Posted Dec 13, 2012 20:36 UTC (Thu) by masoncl (subscriber, #47138) [Link]

The broader issue of collisions against crc32c was known when we selected the hash. At the end of the day, I think that if someone has write permissions to your directory they can prevent creation of arbitrary files.

But the looping is a real problem, and Pascal did a great job writing it up.

-chris

A hash-based DOS attack on Btrfs

Posted Dec 13, 2012 21:46 UTC (Thu) by rahulsundaram (subscriber, #21946) [Link]

any thoughts of using siphash as suggested in the blog?

A hash-based DOS attack on Btrfs

Posted Dec 13, 2012 22:14 UTC (Thu) by masoncl (subscriber, #47138) [Link]

Without the looping, this boils down to one user with write permissions in a directory being able to prevent the creation of new files. There are lots of ways a user might do that.

I'm certainly happy to add a second hash function if there is a strong case for it, but so far I'm not convinced ;)

-chris

A hash-based DOS attack on Btrfs

Posted Dec 13, 2012 22:52 UTC (Thu) by therealmik (subscriber, #87720) [Link]

That's a cop-out... There are plenty of cases where you'd want users to be able to choose filenames but not be able to DoS administrators (or scheduled tasks) that run over those directories.

While I guess server software, such as web servers, samba, etc could always store a random filename then create an index for it in a database, that's exactly what a directory is supposed to be doing anyway - and the database probably has the same vulnerability in whatever hash it uses.

So assuming that you come around to the fact that filename hash collisions are a bad thing, it leads us to:

- Directories probably need a random hash seed for their index (one for the whole FS probably won't cut it)
- crc is not an appropriate hash, as even with a random seed an attacker can still cause collisions

A hash-based DOS attack on Btrfs

Posted Dec 13, 2012 23:06 UTC (Thu) by masoncl (subscriber, #47138) [Link]

So, what you're describing is a secure system where:

1) Users share writable directories (either with each other or with admins)
2) Blocking the creation of specific filenames causes problems
3) Blocking the creation of those filenames by creating a file with that name is not an issue

Right?

File uploads

Posted Dec 13, 2012 23:20 UTC (Thu) by man_ls (subscriber, #15091) [Link]

You are describing basically a server with file upload enabled, for example to upload user avatars. If the file already exists, just append .1 at the end and use that modified name as the key instead.

I would discard the original name and use a random filename, unless for some reason the original name was valuable for anything (e.g. for later user selection). Such use cases do exist, and besides there are many software packages that work this way. That using a new shiny filesystem would automatically convert them from ugly software to DoS targets is not admissible IMO.

A hash-based DOS attack on Btrfs

Posted Dec 14, 2012 0:20 UTC (Fri) by cyanit (guest, #86671) [Link]

The most obvious way of implementing a file hosting service consists in creating a directory named after the user-chosen username or e-mail address, and putting files there with the user-chosen filename... oops...

In addition to SipHash, taking the latest 64 bits of AES-CBC ciphertext of the zero-terminated filename should work and might be faster if the CPU can accelerate AES.

A hash-based DOS attack on Btrfs

Posted Dec 14, 2012 0:48 UTC (Fri) by wahern (subscriber, #37304) [Link]

Doubtful. The key setup would big the biggest bottleneck.

Siphash was designed for this exact scenario. Dan Bernstein isn't known for his lack of performance insights. The paper has gobs of analysis of performance characteristics.

Using a CRC as a hash is just stupid. And I say that as someone who learned that the hard way. People only used it because of the lack of a decent alternative--newer hashes like murmur, etc, didn't have enough proven guarantees to inspire confidence. Siphash was designed to be _the_ go to hash for data structures, with both decent performance and a rigorous security analysis.

A hash-based DOS attack on Btrfs

Posted Dec 14, 2012 2:06 UTC (Fri) by masoncl (subscriber, #47138) [Link]

Going back to the original example, which programs include all three of my conditions? The one example listed here already has a collision scheme included, so the program in theory can get around it.

This isn't a new issue, reiserfs had a similar hashing scheme. Other hashes are different, but crc32c is implemented in hardware on newer CPUs, so there is a real performance benefit here.

-chris

A hash-based DOS attack on Btrfs

Posted Dec 14, 2012 2:21 UTC (Fri) by therealmik (subscriber, #87720) [Link]

So the obvious (although contrived) example, would be a CMS where the username is prefixed to the filename (rather than put in a subdirectory).

Another example would be somebody just wanting to hog a lot of in-kernel CPU time, and causing conflicts with their own filename, even though they're rate-limited by higher-level schemes (eg. server-level rate-limits).

Creating a conflict with a lock file created in the same directory is probably another example.

As a previous poster alluded, crc32c is good for detecting random corruption, not for hash tables or malicious corruption.

The CPU implementation of crc32c argument is also deeply flawed - extra tree walks are not worth the few cycles saved, as tree walks are vastly more expensive.

Remember that hash tables are only O(1) in a "perfect hash" scenario, so the hash table performance approaches what you'd expect from O(n) as the hash gets worse.

A hash-based DOS attack on Btrfs

Posted Dec 14, 2012 3:30 UTC (Fri) by davidescott (guest, #58580) [Link]

I don't think Chris Mason really needs to be reminded how hash tables work. I'll trust he is up on the theory.

Chris (if you are still monitoring the thread),
How would one go about strengthening the FS and making it more resistant to these kinds of attacks? I gather that BTRFS is capable of supporting other hashes -- what are the various options down the road? It it an option at filesystem creation time to use a more collision resistant hash? How big a hash can one use? If one had a hardware SHA-512 and wanted to waste the disk space could they? Are there ways to salt hashes using something secret in the directory structure? Could the Hash be replaced with red-black trees etc?

Given the current on disk layout where would the introduction of protections against this kind of attack introduce btrfs2?

A hash-based DOS attack on Btrfs

Posted Dec 14, 2012 12:35 UTC (Fri) by masoncl (subscriber, #47138) [Link]

We can easily add a different hash to the directory format. There are 64 bits available for the hash, although you're limited to 32 bits on 32 bit userland thanks to the readdir/seekdir/telldir interfaces.

Salting etc are also more than possible. I personally don't think the hash makes the shared directories in any way secure, but I've been wrong many times before ;)

A hash-based DOS attack on Btrfs

Posted Dec 14, 2012 16:09 UTC (Fri) by cesarb (subscriber, #6266) [Link]

I believe the ext* family uses salting (the "Directory Hash Seed" field you can see in dumpe2fs). This should be enough to make the hash unpredictable as long as the attacker does not know the seed, and the only way to know the seed should be having raw block access to the partition and reading the superblock.

But if the hash output is visible via telldir, the attacker can still find out the salt unless you are using strong crypto, which would not be very good for performance. I think the hash output is not visible via telldir on the ext* family.

You could play games like having a per-directory randomly chosen salt, but if you are getting to that point IMHO it is better to work on making the algorithms robust against degenerate edge cases.

A hash-based DOS attack on Btrfs

Posted Dec 14, 2012 17:01 UTC (Fri) by davidescott (guest, #58580) [Link]

It might not be a bad idea to salt, even if the salt can be determined through telldir. As you pointed out anyone running local code has much more obvious methods of attack (touch "filename" instead of touch "hash_collision_with_filename"), and a per directory salt would prevent most of the remote attacks suggested in the comments.

Hopefully nobody exports a "tell/read/seekdir" interface over php.

A hash-based DOS attack on Btrfs

Posted Dec 14, 2012 17:35 UTC (Fri) by dakas (guest, #88146) [Link]

Uh, with a CRC salting a whole directory with the same salt will cause exactly the same collisions as without salt. The salt will just cause the final CRC to be XORed with a constant.

A CRC is not a cryptographic hash. Salting is useless, and identical prefixes have identical impact on the result.

A hash-based DOS attack on Btrfs

Posted Dec 14, 2012 21:56 UTC (Fri) by masoncl (subscriber, #47138) [Link]

I should have made it clear that salting only makes a difference if we add an optional second hash.

A hash-based DOS attack on Btrfs

Posted Dec 14, 2012 2:13 UTC (Fri) by cyanit (guest, #86671) [Link]

But the key setup only needs to be done at filesystem mount time; that said, I have no idea how AES-NI accelerated AES-CBC compares to SipHash in practice.

A hash-based DOS attack on Btrfs

Posted Dec 14, 2012 2:54 UTC (Fri) by cyanit (guest, #86671) [Link]

So, AES-128 takes 80 cycles of latency on Sandy Bridge with AES-NI for each 16 byte block vs SipHash's claim of 134 cycles per 16 bytes on the best 64-bit CPU they give data for.

However, if other instructions can be scheduled by the out-of-order CPU in the meantime, then AES should be much faster, since the reciprocal throughput is actually 10 cycles on Sandy Bridge and should only take up one execution unit.

Without AES instructions things are most likely reversed.

A hash-based DOS attack on Btrfs

Posted Dec 14, 2012 11:23 UTC (Fri) by daniel (subscriber, #3181) [Link]

A few tens of nanoseconds in either case. That is spectacularly good performance. I'll take a hash designed by a hash expert any day over a cheap hack or something I rolled myself that seems to pass my tests. Been there, wish I'd had Siphash then. Using CRC for the hash was just a bad idea, Chris must have been having a bad day that day. Best time to fix it is now, it only gets harder later.

A hash-based DOS attack on Btrfs

Posted Dec 14, 2012 15:27 UTC (Fri) by drag (subscriber, #31333) [Link]

BTRFS folks responded that supporting other hashing algorithms is on their list of patches to apply.

A hash-based DOS attack on Btrfs

Posted Dec 15, 2012 2:11 UTC (Sat) by daniel (subscriber, #3181) [Link]

Anyway, it is apparent that issue is not merely a linear increase in directory operation time, but a for-real bug that looks like some kind of live-lock. The hash can and should be fixed, because although the attack in question turned out to achieve something other than its original intent, there is no question that trivially being able to force hash collisions is a vulnerability.

The most important result here is not the inappropriateness of CRC as a hash, but that Btrfs is a complex beast, still in development, a critical bug was just exposed, and likely a few more remain. Given wide enough testing and continued developer commitment it will become stable, but today it is not. From where I sit, it looks like Ext4 will be wearing its standard Linux filesystem crown for some time yet.

A hash-based DOS attack on Btrfs

Posted Dec 15, 2012 6:17 UTC (Sat) by SEJeff (guest, #51588) [Link]

Unless tux3 matured first *poke*

A hash-based DOS attack on Btrfs

Posted Dec 15, 2012 13:18 UTC (Sat) by vonbrand (subscriber, #4458) [Link]

From a cursory look over, I'd say 220 *minutes* isn't just a bad hash degenerating to a linear search...

A hash-based DOS attack on Btrfs

Posted Dec 15, 2012 2:46 UTC (Sat) by wahern (subscriber, #37304) [Link]

But AES-NI isn't the complete algorithm. Siphash numbers are for the complete operation, including compression and output. You could probably cobble something basic together with the internal AES mixer (what AES-NI optimizes), but then you've just invented your own algorithm with unknown characteristics, all in an attempt to gain negligible performance improvements. That sort of premature optimization is what causes these sorts of bugs in the first place.

A hash-based DOS attack on Btrfs

Posted Dec 14, 2012 10:30 UTC (Fri) by robert_s (subscriber, #42402) [Link]

Please see my above comment about zipfiles

A hash-based DOS attack on Btrfs

Posted Dec 15, 2012 8:19 UTC (Sat) by eupator (guest, #44581) [Link]

> The broader issue of collisions against crc32c was known when we selected
> the hash. At the end of the day, I think that if someone has write
> permissions to your directory they can prevent creation of arbitrary files.

The problem is far from being limited to such case. Besides the already mentioned unpacking of archives, mirroring remote directories is very common - consider rsync, recursive wget, or version control systems.

A hash-based DOS attack on Btrfs

Posted Dec 15, 2012 17:49 UTC (Sat) by Karellen (subscriber, #67644) [Link]

Hmmm.....version control systems. That provokes an interesting thought.

Create a Linux patch series, preferably btrfs-related patches for maximum lulz, whose git SHA-1 ids (any blob, tree, or commit ids should be fine) end up generating filenames in the same object-subdirectory with CRC32 collisions, and send a pull request to lkml.

Hilarity ensues!

The object-subdirectory requirement means there's a 40-bit keyspace to search, rather than 32-bit, but one could probably find enough whitespace/punctuation entropy in comments to generate enough collisions to produce at least some effect, even if you can't find enough to lock up a system.

Also, AIUI, pull requests are unconventional and might be ignored, but if you send the patch series directly to the list, the implicit rebasing will destroy the object ids (and therefore the collisions) - unless all the collisions are in file blobs and none of the files have been touched before your patches are applied.

Anyone any idea how plausible/expensive that attack would be?

A hash-based DOS attack on Btrfs

Posted Dec 16, 2012 10:52 UTC (Sun) by akeane (subscriber, #85436) [Link]

Yet another example of modernization for it's own sake!

Consider: even on MS-DOS you had 11 fields of alphanumerical characters to store your "records", even with upper case that's at least 24^11 possible filenames!

Behold:

C: dir
AKEANE_I.SAW
ESOME.MAN

In fact you could probably store all the info you want just in the filenames, think of the parchment you would save!

Then (sigh) all this rattling on about "journaling" file-systems, journals are way too modern, I would like an Almanac based system.

Rather than the worrisome thoughts caused by the fsick fix-me-do stressing out the poor kernel when it can't mount anything, I would much prefer an image of the kernel laying back in it's rocking chair, smoking a corn pipe, and thinking: "Ah yes, ma boy, inode 3144256 only gets recovered when the full moon passes the new born lambs before the high tide, 'tis written in ye olde Almanac"

On that note I wish ye all a Merry Yuletide and will sit at the bottom of my fire hole waiting for the delivery of my new Xmas monocle, so I can pay homage to the late great Sir Patrick Moore (who invented the concept of going outside in the night and looking up)

oh, and imbibing Gin, 'tis the season!

Happy Holidays!




A hash-based DOS attack on Btrfs

Posted Dec 17, 2012 15:46 UTC (Mon) by butlerm (subscriber, #13312) [Link]

Any comment on why the filesystem is apparently taking 220 minutes to delete 500 files in some cases? It should only be a tiny fraction of that even if all the file names hash to the same value.

A hash-based DOS attack on Btrfs

Posted Dec 18, 2012 13:13 UTC (Tue) by engla (guest, #47454) [Link]

Please read the updated section of the original story, "the second attack does NOT generate an infinite loop within the btrfs code, but merely within the bash expansion code which is responsible to expand the command line rm *."

A hash-based DOS attack on Btrfs

Posted Dec 18, 2012 15:08 UTC (Tue) by butlerm (subscriber, #13312) [Link]

Thanks. So in other words, the first problem might possibly be hit in practice, and really ought to be fixed soon, but the second problem isn't likely to yield any real world consequences unless an attacker is in a position to generate tens of thousands of files with names that all hash to the same value. No?

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds