Generic hashing functions and the platform problem
What is a kernel developer to do when they need a simple hashing function to use in a new hash table, and they find that the obvious choice provided by the kernel works poorly? The "right" option is to fix the common code. The "easy" option is to write a replacement or a workaround. The "best" option, it seems, is to make sure Linus Torvalds finds out, because this is just the sort of thing that he cares about.
Linux has a fairly simple and efficient set of hashing functions in include/linux/hash.h that work on simple input values: 32-bit or 64-bit numbers, or pointers that are cast to one of those depending on the architecture. The hash is computed by multiplying the input by one large number and ignoring any overflow, thus effectively taking the remainder of a division by another large number: 232 or 264, depending on the word size. The required number of bits are then extracted from the result.
When Thomas Gleixner was testing his second patch set for making futexes faster, he discovered that the hash values returned weren't particularly evenly distributed and, as a result, he was receiving more collisions than expected. He addressed this problem by writing a simple alternative, sparking a sub-thread exploring the problems with these hash functions. As Torvalds found during subsequent investigations, Gleixner was not the first developer to decide it was easier not to fix the broken function. When working on the RAID 5/6 code for Btrfs, David Woodhouse, or possibly Chris Mason, discovered problems with the "hash_64()" function and helpfully provided a comment:
* we shift down quite a bit. We're using byte * addressing, and most of the lower bits are zeros. * This tends to upset hash_64, and it consistently * returns just one or two different values.
It was determined that zeros in the lower-order bits result in the remaining bits not being mixed well, so there is not much variation in the output — not a particularly useful property for a hash function. The chosen response was not to fix hash_64() but to shift down the input to hide the problem.
Some eight years ago, Matthew Wilcox made some changes to the interfaces for the hash function and noted in the commit message:
So he, too, knew that there were problems with the hashing function, particularly the 64-bit version. It would be naive to think that these three are the only developers to have noticed a problem but none made the effort to fix it. In the 14 years since this code was introduced, several people have noticed a problem and no one has bothered to fix it. This is an example of what has become known as the "platform problem".
Fortunately Torvalds was interested in the futex changes and so paid attention to the second patch set from Gleixner. He made no comment on the futex changes, but did take interest in the hash changes, researching the problem and orchestrating the fix. The key issue turned out to be the number that the input is multiplied by. Supposedly based on recommendations from Donald Knuth, this number was chosen to be a prime and to be approximately the fractional part of the golden ratio multiplied by 264 (in the 64-bit case). A prime was chosen that was "bit sparse", having long runs of ones or zeros in the binary representation.
This latter choice was guided by the desire for the multiplication to be fast. On some hardware, shifts and addition or subtraction can be faster than general multiplication and with fewer runs in the binary pattern, fewer additions or subtractions are needed. For the 64-bit case, the pattern is particularly sparse, having a run of 33 one bits.
/* 2^63 + 2^61 - 2^57 + 2^54 - 2^51 - 2^18 + 1 */ #define GOLDEN_RATIO_PRIME_64 0x9e37fffffffc0001UL
This sparseness was the problem. Torvalds's research suggested that the primality was barely interesting and a misunderstanding. As George Spelvin later explained it:
It also appears that the golden ratio is not all that magical. It is
just a convenient number that is, as Torvalds put it, "an
irrational number that is not near to 0 or 1
".
The important property was to have "roughly 50% of the bits set
in a random pattern
" so the resultant mixing of bits would hide
any patterns in the input. The current hash function, it seems,
emphasizes exactly the wrong qualities.
Once the problem was understood, fixing it was relatively straightforward: just pick a better multiplier with more "randomness" in the bit pattern. This was done by approximating an irrational value and not worrying about primality or bit-sparseness. On hardware without 64-bit multiplication, the two 32-bit halves of the 64-bit input are hashed sequentially so two 32-bit multiplications are used. This doesn't return a 64-bit number, but no code ever wants more than 32 bits of hash, so not providing the full 64 is no loss.
Spelvin made good use of the opportunity provided by this need to adjust the hash functions and examined other hashing code, fixing up the hash_mem() and hash_str() function in the sunrpc code (used by nfsd) and suggesting improvements for some simplistic hashing used in the inode cache.
A minimal fix has landed for Linux 4.6-rc7 with a more complete fix expected for v4.7. It is good to know that this problem with the Linux kernel platform has been addressed, but it does lead one to wonder what other problems there are that have been conveniently ignored by many of us, and just need a little light to shine on them at the right time for the problem to be quickly fixed.
Index entries for this article | |
---|---|
Kernel | Hashing functions |
GuestArticles | Brown, Neil |
Posted May 19, 2016 9:36 UTC (Thu)
by Sesse (subscriber, #53779)
[Link] (1 responses)
Posted May 19, 2016 13:51 UTC (Thu)
by clemensg (guest, #94377)
[Link]
Posted May 19, 2016 9:54 UTC (Thu)
by mjthayer (guest, #39183)
[Link]
Posted May 20, 2016 15:17 UTC (Fri)
by willy (subscriber, #9762)
[Link] (3 responses)
Perhaps we could add tests of the "goodness" of the hash functions to tools/testing? It's been invaluable for preventing regressions of the radix tree code.
Posted May 21, 2016 0:00 UTC (Sat)
by neilbrown (subscriber, #359)
[Link] (2 responses)
It is part of the human condition, visible in many communities, that it is easier to say "we could" or "we should" rather than "I will". I think the "platform problem" is, in part, how that condition manifests in this community.
Not that it is necessarily wrong to say "we could": people who say "I will" too often can easily get burnt out.
Some sort of balance is best: where many in the community feel empowered, encouraged, and even permitted, to say "I will", without feeling undue pressure (so no "You should"!).
Posted May 21, 2016 0:10 UTC (Sat)
by willy (subscriber, #9762)
[Link]
But this is a great opportunity for someone who wants to improve the kernel in a quantifiable way to get involved. It's almost entirely user-space programming, but it involves working with kernel code. Everybody agrees we need better testing, so there's likely to be positive feedback.
Now, how best to bring this idea to the attention of someone who has time and inclination to work on it?
Posted May 25, 2016 11:27 UTC (Wed)
by robbe (guest, #16131)
[Link]
Alternatives? Anonymous bug reporting? Adding it to a list of newbie projects?
Posted May 20, 2016 16:32 UTC (Fri)
by jlargentaye (subscriber, #75206)
[Link] (2 responses)
They are obviously a very skilled hacker who have been around Linux for a long time. Though I can't find the reference immediately, I remember them correcting/optimizing someone's PPC(?) assembly code, then stating mild uncertainty because they "don't have access to hardware to test it." (As I remember it, the optimized code worked flawlessly).
Posted May 20, 2016 16:36 UTC (Fri)
by jlargentaye (subscriber, #75206)
[Link]
http://www.gelato.unsw.edu.au/archives/git/0504/1753.html
Of note, this was also work on a hash function.
Posted May 24, 2016 17:47 UTC (Tue)
by jzbiciak (guest, #5246)
[Link]
Wouldn't it be something if "George Spelvin" was actually Knuth? ;-) (It seems unlikely, but I'm amused by the possibility.)
Generic hashing functions and the platform problem
Generic hashing functions and the platform problem
Generic hashing functions and the platform problem
Generic hashing functions and the platform problem
Generic hashing functions and the platform problem
Someone out there probably wants to take the test harness that Linus threw together (mentioned in the email thread) and put it in tools/testing in a useful way. What can we ... What can *I* do to permit, empower, and encourage that?
Generic hashing functions and the platform problem
Generic hashing functions and the platform problem
Generic hashing functions and the platform problem
Generic hashing functions and the platform problem
Generic hashing functions and the platform problem