Python adopts SipHash

By Jake Edge
November 27, 2013

Hash collisions are a fact of life, but one that can have serious consequences when attackers can control the values being hashed. We looked at this problem, which can allow denial-of-service attacks, back in January 2012 for Python, PHP, Java, Ruby, JavaScript, and other dynamic languages. While fixes were made at that time, the hash function used by Python was still vulnerable to certain kinds of attacks. That situation is now changing, with Python Enhancement Proposal (PEP) 456 having been accepted for inclusion in the upcoming Python 3.4 release. That means Python will be using SipHash for its hash function going forward.

Hash functions are used to reproducibly turn an arbitrary string (or series of bytes) into a single value (of a length determined by the type of hash) that can be used for various purposes. For example, cryptographic hash functions are used to derive digest values for digital files that can then be used with signature algorithms to digitally "sign" documents or other data (e.g. distribution software packages). Hash functions are also used for data structures like dictionaries (aka hashes or associative arrays) where the function maps the key to a value that can be used to find the data associated with the key in the dictionary.

But, with any hash function, multiple keys will hash to the same value. In a data structure, that is often handled by making each hash "bucket" actually be a linked list of the colliding entries—normally just a few. Operations, such as lookup, insert, or delete, on keys that hash to a particular bucket then have to traverse the list. If the number of collisions is low, the effect of a short list traversal is minimal, but if that number is high, it can significantly impact the performance of those operations.

Normally, languages try to choose hash functions that will fairly evenly spread the expected key values into the hash space. But if the hash function is known to an attacker, and they can arrange to provide the key values to be hashed, denial-of-service attacks are possible. One way attackers can do that is with HTTP POST (form submission, essentially) requests. Many web application frameworks helpfully collect up all of the POSTed variables into a dictionary for delivery to the application. Just supplying a list of variables that all hash to the same bucket (and possibly submitting that POST multiple times) may be enough to bring a web server to its knees.

This was all discovered in Perl in 2003, then rediscovered for other languages in late 2011. A bug was opened for Python, then closed a few months later after the hash function used by the interpreter was randomized based on a flag (-R) given at run time. That didn't fully solve the problem, however, since effectively only 256 separate functions were used—an attacker could use various techniques to determine the hash function, thus could still cause a denial of service. In fact, Jean-Philippe Aumasson and Daniel J. Bernstein developed a proof-of-concept attack to recover the seed used to randomize the hash function for Python 2.7.3 and 3.2.3.

Shortly after the original bug was closed, a new bug was filed that recognized the inadequacy of the solution, but it took 18 months or so before PEP 456 was formally accepted for inclusion into Python 3.4. PEP author Christian Heimes looked at several different hash functions before settling on the SipHash24 variant. It "provides the best combination of speed and security" and several other high-profile projects (e.g. Ruby, Perl, Rust, FreeBSD, Redis, ...) have chosen SipHash.

Unlike earlier hash functions used by Python and others, SipHash is a cryptographic hash. Python's current implementation uses a modified Fowler-Noll-Vo (FNV) hash function, which was changed to add a random prefix and suffix to the bytes being hashed. But Heimes is convinced that "the nature of a non-cryptographic hash function makes it impossible to conceal the secrets".

At the time that PEP 456 was being written, there was another discussion of the issue on the python-dev mailing list. Heimes started the conversation by soliciting opinions on whether the hash function should be a build-time option or be switchable at run time. Most felt that choosing the algorithm at compile time was sufficient, but some, including Python benevolent dictator for life (BDFL) Guido van Rossum, were not at all convinced that any change was needed.

Van Rossum was concerned that the problem is largely manufactured by "some security researchers drumming up business"—that it is only of theoretical interest. But Armin Rigo disagreed:

It should be IMHO either ignored (which is fine for a huge fraction of users), or seriously fixed by people with the correctly pessimistic approach. The current hash randomization is simply not preventing anything; someone posted long ago a way to recover bit-by-bit the hash randomized used by a remote web program in Python running on a server. The only benefit of this hash randomization option (-R) was to say to the press that Python fixed very quickly the problem when it was mediatized :-/

In the end, Van Rossum did not put his foot down; he said that he was "fine with a new hash function as long as it's either faster, or safer and not slower". SipHash24 has been within a few percent of the performance of the existing hash function on several different benchmarks. There are concerns that it will impact performance for short keys (less than, say, seven bytes) because it has some setup and teardown costs, so switching to a faster but less secure hash for short keys is being investigated.

According to the PEP, there are multiple places in the CPython code that use their own version of the hash algorithm: "the current hash algorithm is hard-coded and implemented multiple times for bytes and three different Unicode representations". That would make it harder for someone trying to put in their own replacement, so the PEP also proposes reworking the internals of Python such that the hash function can be replaced in a single location. That will appear in Python 3.4 as well.

It seems a bit strange that it took this long for Python to fix the problem. As Rigo said, ignoring the problem might be a reasonable approach for a substantial fraction of Python users, but that was true before the previous fix was applied as well. Given that Python developers presumably don't just want to apply a cosmetic fix, it is a little surprising it took this long to get to the "proper" solution. But Larry Hastings may be right with his suggestion that "there was enough bike shedding that people ran out of steam" to immediately address the problem.

Given how widespread SipHash is now for dictionary hash functions, we are potentially vulnerable to some kind of breakthrough in finding collisions in that algorithm. But, at least there are a full 64 bits of entropy being used by SipHash (rather than the eight bits for the modified FNV function). That should at least make brute force attacks infeasible—we will just need to keep our eye out for cryptographic breakthroughs down the road.

Index entries for this article
Security	Python
Security	Vulnerabilities/Denial of service

DJB

Posted Nov 28, 2013 1:12 UTC (Thu) by cesarb (subscriber, #6266) [Link]

It seems SipHash is yet another DJB algorithm. While his implementation choices are often... unorthodox, his algorithms seem to be consistently great.

So where did end up for earlier verisons of Python?

Posted Nov 28, 2013 1:25 UTC (Thu) by ras (subscriber, #33059) [Link] (7 responses)

You are not kidding about the bike shedding. The discussion on issue13703 http://bugs.python.org/issue13703 is longer than I care to read. It was originally suggested the fix be backported to 2.4 onwards. http://bugs.python.org/issue13703#msg150525

Is that is happening? And am I right in assuming randomisation will be off by default for everything prior to 3.4?

The next thing to fix is the non-deterministic behaviour of the regular expression recognisers. They are another source of a potential DOS, and unlike dict table collisions which never seem to happen in real life, having programs die because some input to an regex caused pathological behaviour has happened to me several times.

So where did end up for earlier verisons of Python?

Posted Nov 28, 2013 8:24 UTC (Thu) by kleptog (subscriber, #1183) [Link] (6 responses)

Is that (fix the non-deterministic behaviour of regexs) even possible? Most regexes are fairly simple, but as the quantifiers increase it becomes more likely that you can make a pathelogical input for it. Can you do anything other that just flagging it?

So where did end up for earlier verisons of Python?

Posted Nov 28, 2013 10:34 UTC (Thu) by epa (subscriber, #39769) [Link] (4 responses)

Provided you limit yourself to true regular expressions (forgoing non-regular extensions such as backreferences), I believe the worst-case performance can be made tractable. This paper http://swtch.com/~rsc/regexp/regexp1.html has more details.

So where did end up for earlier verisons of Python?

Posted Nov 29, 2013 12:23 UTC (Fri) by k8to (guest, #15413) [Link] (3 responses)

Sadly the modern regex implementations in use all do not limit themselves in this way.

Morover, the fancy regex engines like pcre are actually significantly faster than this implementation for the typical cases.

So it's really a hard sell. You have to convince the users of your regex interface to accept losing all the functionality they now rely on, and also accept a notable speed penalty for your expected cases, in the name of achieving more expectable behavior for exceptional cases.

For some software that's reasonable. For other software it's better to just limit the recursion and handle the failures.

So where did end up for earlier verisons of Python?

Posted Nov 29, 2013 17:18 UTC (Fri) by epa (subscriber, #39769) [Link] (2 responses)

Actually, I think it is backreferences and other non-regular features that are the 'exceptional case' - 90% of regexps do not use them. And most regexp matches are so fast that they are not the performance bottleneck in a program that uses them. The cases where people care about speed are obviously the ones that run particularly slowly, which is often (though not always) due to poor worst-case performance.

It would therefore be better to use the safer (as in more predictable) NFA implementation by default, with DFA available as an option if you want to do non-regular matching, or if you've made a conscious decision to sacrifice worst-case performance for better performance on your typical workload.

But in perl, at least, the regexp engine is so tied up with the guts of the language it would be a Herculean task to replace it.

Limiting the allowed recursion is a good idea and it would be nice if languages offered a friendly way to set this.

So where did end up for earlier verisons of Python?

Posted Nov 30, 2013 2:25 UTC (Sat) by mchapman (subscriber, #66589) [Link] (1 responses)

> But in perl, at least, the regexp engine is so tied up with the guts of the language it would be a Herculean task to replace it.

That used to be the case, however since Perl 5.9.5 different RE engines can be used in place of the native one. See http://search.cpan.org/search?query=re%3A%3Aengine for some examples.

So where did end up for earlier verisons of Python?

Posted Nov 30, 2013 8:24 UTC (Sat) by epa (subscriber, #39769) [Link]

Yes, you can plug in other engines, but all of those (even PCRE I expect) have different semantics to the builtin regexp engine. I was talking of replacing the standard engine used in all code, with all the same behaviour including Unicode handling and probably even 'experimental' features like (?{}). That would be a huge effort.

After all, the idea is not that the programmer can specially choose a different regexp implementation with more predictable performance - that is already the case - but to change the default so that all programmers, even those who don't read LWN and didn't study finite automaton theory, can write code which doesn't have hidden DoS performance bugs.

So where did end up for earlier verisons of Python?

Posted Nov 29, 2013 1:18 UTC (Fri) by ras (subscriber, #33059) [Link]

> Is that (fix the non-deterministic behaviour of regexs) even possible?

Yes and no.

We use finite automata to recognise regex's. A automata is just a fancy name for a directed graph. The lines on the graph that connect one state to the next are labelled with the input character. If there is no line from the current state labelled with the next input character the regex didn't match the input. If there is a line, you move onto the next state and input character. If this eventually gets you to the end of the graph, you have a match. The re.compile() call compiles your pattern into the said graph. At least that is the theory. In practice Python's re module is derived from a the C PCRE library, which looks to work differently again. It compiles to a mini program - but it turns out that program is equivalent to a automata.

There are two distinct types of automata - Deterministic and Non-Deterministic. (Actually in theory there is also infinite and infinite, which refers to the number of states in the graph. For obvious reasons practical computers only deal with finite graphs.) The sole difference between a NFA (Non-deterministic finite automata) and a DFA (deterministic finite automata) is an NFA state is allowed to have several lines labelled with the same input character. Which is where the "non-deterministic" comes from - when confronted that input character an NFA has no deterministic way of knowing which line to choose. In practice a real computer must execute all of them, either in parallel or using backtracking (which is effectively what PCRE does).

That difference between NFA and DFA oddly doesn't effect the their power to recognise complex regular expressions overly. What it does effect is their performance characteristics. The space used by compiled graph on an NFA is of the same order as the size of the regular expression it is compiled from (ie the relationship between the size of the regular expression and the size of the compiled graph is linear). However, for some inputs it's execution time can be exponential to the number of characters in the input.

In contrast a DFA's execution time is always linear to the size of it's input. In practice the execution time is always faster than that implies because an NFA has to do substantially more bookkeeping. Studies I've seen say around 20 times faster. However, the space used by its compiled graph is in the worst case an exponential function of the size of the pattern it is executing. Worse, notice I used the term "execution time" when describing their relative speed. That is because in an interpreted language like Python always has to compile the regular expression at run time. That compiling can also take an exponential amount of time. So this is where the "no" in my "Yes and No" answer comes from. Both can take exponential amounts of time.

A backtracking recogniser like PCRE gets a lot of it's speed by delaying most of the work until the it recognises the input. This can save a lot of time (think a lot of '|' conditions that aren't executed.) That's were PCRE gets it's speed from in the "usual case". So if you are recognising a regular expression just once PCRE is usually faster than other methods, when you include compile time. But it is also the worst in execution time if it strikes a pathological input.

So now we get the "Yes" part of my "Yes and No" answer. When we are taking about protecting against a DOS, the nice thing about a DFA is it's predictable. Predictable in the sense that while the compile step incurs a large upfront cost, that cost is always the same in every run of the program - including testing runs. If it is a problem you change your regex during testing. So the user will never see a program fail because of some input the programmer hadn't tested for. PCRE is at the opposite end of the spectrum. You will never see an issue from the speed of PCRE compiles. Unless you are very, very good you won't generate a pathological input during testing. But your program can mysteriously fail (well, hang) during production.

That had never happened to me for 20 years. And now suddenly, it's happened every couple of months. Having programs fail on inputs I am not and never will be smart enough to generate test cases for is not good. Looking at this from the standpoint of a Software Engineer trying to build reliable things, I'll wear the upfront cost of DFA's every time. Hell, the upfront cost of DFA's needn't even be incurred at runtime even in languages like Python. All the re module need do it allow you to save the compiled form somewhere so you can reuse it at runtime. And guess what? If you eliminate the time it takes to compile a DFA, it's execution time is always linear based on the size of the input. That is how my "Yes" answer arises.

what perl is doing

Posted Nov 28, 2013 15:27 UTC (Thu) by noxxi (subscriber, #4994) [Link]

A few weeks ago there was an long article about the hash algorithm(s) perl is using and why they changed again. Good companion to this article:
http://blog.booking.com/hardening-perls-hash-function.html