SipHash in the kernel
SipHash is the creation of Jean-Philippe Aumasson and (inevitably) Daniel J. Bernstein; readers interested in the details can find them in this paper [PDF]. It was designed with a number of objectives in mind, starting with being a cryptographically secure hash function. In practice, what that means is that it is computationally infeasible to derive the input data from its corresponding hash, or to derive the secret data used in the hashing operation even given the ability to see the output for a set of chosen inputs. Another important objective was speed, especially with smaller inputs. Many existing hash functions have a high setup overhead; that cost matters little when large amounts of data are being hashed, but it hurts for the hashing of smaller inputs. As it happens, many of the hashing operations in the kernel are applied to small chunks of data, so lower overhead would be welcome.
The list of SipHash users is large and growing; many projects have adopted it in an attempt to defend against hash-collision attacks. These attacks exploit a known hash function to cause a hash table to degrade into a simple linear list, with potentially devastating effects on performance. The Python language switched to SipHash in 2013; other users include various BSD distributions, Perl, Ruby, Rust, and more. This move is not universally acclaimed, but most seem to see it as a step in the right direction. Thus far, however, the kernel has lacked a SipHash implementation.
What does the kernel use instead? As might be expected with a large body of code like the kernel, different algorithms are employed in different settings. The generation of TCP sequence numbers, for example, is done using the MD5 hash function, which has been regarded as insecure for some time. That is potentially problematic, since an attacker who can predict sequence numbers can interfere with or inject data into network connections. The get_random_int() and get_random_long() functions used extensively throughout the kernel are also based on MD5. The "syncookies" that can be employed to defend against SYN flood attacks are, instead, generated with SHA-1 which, while more secure than MD5, is showing its age as well. SHA-1 is also used in the core random-number generator, in the BPF subsystem, and elsewhere.
Use of those algorithms, however, pales next to the usage of a function called jhash() (and its variants), a Jenkins hash implementation. The kernel contains a lot of hash tables, and, as a general rule, jhash() is the hash function used to place data into hash buckets. This function has the advantage of being quite fast, but it makes no claims to cryptographic security. Many in-kernel users include some secret data of their own as a defense against collision attacks. But if the results of the hash are visible to a hacker (and simply listing the contents of the table in order may suffice), then deriving that secret data is a relatively easy task.
Jason Donenfeld set out to replace all of these hashing functions with an implementation of SipHash inside the kernel. SipHash uses an explicit secret key for collision defense, so the first order of business for an in-kernel user is the generation of that key:
#include <linux/siphash.h> siphash_key_t hash_secret; get_random_bytes(&hash_secret, sizeof(hash_secret));
The use of get_random_bytes() is, according to the documentation, the only proper way to initialize this secret. Thereafter, of course, kernel code should take care not to expose the secret outside of the kernel itself, or the protection against hash collisions will be lost.
The hashing of data is done with:
u64 siphash(const void *data, size_t len, const siphash_key_t *key);
The return value will be a 64-bit hash of the input data. There are a number of optimized variants for constant-size input data, but most developers need not worry about those since the generic version will pick one of those at compile time if appropriate.
SipHash is significantly faster than either MD5 or SHA-1, while producing results that are deemed to be more secure. So the replacement of the older algorithms with SipHash should not be a difficult decision to make. The same is not true for jhash(), which is much faster than SipHash. In an attempt to convince jhash() users to make the switch, Donenfeld added a "HalfSipHash" variant:
u32 hsiphash(const void *data, size_t len, const hsiphash_key_t *key);
This version uses a reduced variant of the SipHash algorithm to produce a
smaller and less secure result. Users of jhash() who do not want
to pay the cost of SipHash might just be convinced to use this version
instead, an outcome described by Donenfeld as "a terrifying but
potentially useful possibility
". Potential users will note that,
while hsiphash() is faster than siphash(), it still takes
about three times as long as jhash() to produce a result. The
better security that comes from using it should justify the cost in many
settings, but it also seems likely that jhash() won't be going
away anytime soon.
The patch set has been through a few revisions, with some relatively small changes being made. The biggest complaint about it seems to have come from networking maintainer David Miller, who was not entirely happy about moving away from hash functions that are implemented by the CPUs themselves:
The interesting thing, as a couple of participants pointed out, is that Linux is not actually using these hashing instructions even on the hardware that supports them. Among other things, they require some setup cost that takes away a lot of the performance benefit, especially for small input data arrays. So the existence of hardware-based implementations is, for now, not relevant.
In any case, Miller applied the
patches on January 9, so they should make it into the mainline
during the 4.11 merge
window. The process of converting at least some of those jhash()
users has not
yet begun, though, and can be expected to take some time.
Index entries for this article | |
---|---|
Kernel | Hashing functions |
Security | Linux kernel |
Posted Jan 13, 2017 21:48 UTC (Fri)
by ballombe (subscriber, #9523)
[Link] (3 responses)
Posted Jan 16, 2017 12:15 UTC (Mon)
by robbe (guest, #16131)
[Link] (2 responses)
Only the latter is relevant if it is used as described in the article.
Posted Jan 17, 2017 19:56 UTC (Tue)
by pjones (subscriber, #31722)
[Link] (1 responses)
Though if we all act on that now, maybe that will reduce the interest enough to slow the research down ;)
Posted May 1, 2017 15:05 UTC (Mon)
by khim (subscriber, #9252)
[Link]
Yes, sure, sooner or later it would be broken, but it'll take many, many, years!
Posted Jan 21, 2017 11:40 UTC (Sat)
by anton (subscriber, #25547)
[Link] (3 responses)
Posted Jul 15, 2024 9:11 UTC (Mon)
by LtWorf (subscriber, #124958)
[Link] (2 responses)
It might not be too feasible.
Posted Jul 15, 2024 13:34 UTC (Mon)
by Wol (subscriber, #4433)
[Link] (1 responses)
I got involved in a discussion about hashing, for some reason, and the kernel algorithms go as far as completely rebuilding the hash list if they think an attack is under way.
It's not that expensive in the grand scheme of things. Either the hash list is small, or it's badly degraded. So either the cost IS minimal, or it's RELATIVELY minimal.
Cheers,
Posted Jul 15, 2024 16:05 UTC (Mon)
by Wol (subscriber, #4433)
[Link]
Set up a second hash table.
So performance is still degraded for a little bit as the old table still has to be searched, but it improves rapidly as normal processing will remove a load of entries from the degraded hash, and RCU will remove the rest. Or even it doesn't bother cleaning up the old hash straight away, as it expects normal operation to reap a lot of entries pretty quickly.
Cheers,
SipHash in the kernel
SHA-1 „completely broken“
SHA-1 "completely broken"
SHA-1 "completely broken"
For hash tables one can combine performance in the non-attack case with performance in the under-attack case: Use a fast hash function by default, monitor the collisions, and if collisions rise significantly beyond the statistically expected number, switch to a cryptographically secure hash function.
SipHash in the kernel
SipHash in the kernel
SipHash in the kernel
Wol
SipHash in the kernel
Tell the search algorithm to search both tables.
Add all new entries to the new table.
Use something like RCU to copy entries from the old table to the new.
Wol