LWN: Comments on "A generic hash table" https://lwn.net/Articles/510202/ This is a special feed containing comments posted to the individual LWN article titled "A generic hash table". en-us Mon, 10 Nov 2025 05:52:36 +0000 Mon, 10 Nov 2025 05:52:36 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net A generic hash table https://lwn.net/Articles/512012/ https://lwn.net/Articles/512012/ HelloWorld <div class="FormattedComment"> You're right of course :). Oops.<br> </div> Fri, 17 Aug 2012 09:49:32 +0000 A generic hash table https://lwn.net/Articles/512001/ https://lwn.net/Articles/512001/ jzbiciak <P>Actually... any use of that<TT> max </TT>that returns<TT><B> bool </B></TT>would have rather unusual semantics with that return type. Almost reminds me of <A rel="nofollow" HREF="http://me.veekun.com/blog/2012/04/09/php-a-fractal-of-bad-design/">PHP</A>. ;-)</P> <P>As far as the whole C / C++ debate goes: Up until a few years ago I was an ardent C proponent and avoided C++. Since then, I've really given C++ a chance, and I now tend to prefer it for new projects. I still use C for quick one-offs, but for anything larger I use C++.</P> <P>For one thing, C++ compilers are much better now than when I first looked at them ~15 years ago. You can throw some seriously complicated template messes at a modern compiler and it'll sort it out and give you rather tight code more often than not. 5 to 10 years ago, good code in that circumstance was more the exception than the rule. Bringing STL into the language in the 90s, plus the rise of Boost and template metaprogramming really forced compilers to up their game here.</P> <P>I remember around 10 years ago playing with the "Stepanov" benchmark. It's a benchmark that measures a compiler's ability to deal with abstraction in a C++ program by wrapping a simple addition with a<TT> double </TT>in increasing amounts of C++ abstraction. G++ performed rather poorly at the time (incurring a 10x - 15x slowdown at the worst, IIRC) and our own compiler at work showing a 50x slowdown. Now I believe both compilers show no slowdown. They are able to flatten away the abstractions. It's beautiful.</P> <P>For the problems I'm tackling in my day job, I really appreciate the compositional style templates offer me. Too often I find myself needing "an X with detail Y". In C, I'd have to code up a separate function that combined the two, or a macro that instantiated X subbing in a Y where necessary. In C++, it's just X&lt;Y&gt;, if X is a suitable template. I get type safety and an awful lot of compile-time computation and help.</P> <P>This can be really wild but useful at the limit. I have a truly compelling example I'd love to share, but since it's my day job and it's proprietary technology, I don't know just how much I can share. I will say that the code I generate relies on a ton of compile-time computation to generate code that's sufficiently efficient that I don't mind running on design simulations of our chips that run at speeds on the order of 100Hz, plus/minus an order of magnitude. You read that right: anywhere from 10 to 1000 CPU cycles per second.</P> <P>(Note: I've completely ignored error messages. C error messages are rarely obtuse and usually easily grokked. C++ with multiple nested templates is another ball of yarn altogether... They generally suck, and I'm with you if you think they're an abomination.)</P> Fri, 17 Aug 2012 09:02:01 +0000 A generic hash table https://lwn.net/Articles/511998/ https://lwn.net/Articles/511998/ jzbiciak <BLOCKQUOTE><TT>template&lt;typename T&gt; <B>bool</B> max(T x, T y) { return x &gt; y ? x : y; }</TT></BLOCKQUOTE> <P>Erm... I don't think that's the right return type, unless you want<TT> max(a, max(b, c)) </TT>to have rather unusual semantics.</P> Fri, 17 Aug 2012 08:18:23 +0000 User space unit tests? https://lwn.net/Articles/511363/ https://lwn.net/Articles/511363/ alex <div class="FormattedComment"> While enabling run time testing of some kernel features is useful (the VM torture tests spring to mind) it would be much better if library code was testable in a user space build that didn't involve bringing a kernel up.<br> <p> How about adding "make unittests" as a target for the kernel build?<br> </div> Wed, 15 Aug 2012 15:01:40 +0000 A generic hash table https://lwn.net/Articles/511246/ https://lwn.net/Articles/511246/ HelloWorld <div class="FormattedComment"> <font class="QuotedText">&gt; I look forward to you arguing that such implementations aren't good while ignoring that most of the problems with macros also exist with templates.</font><br> Nonsense, there are numerous problems with macros. Here's a simple example:<br> template&lt;typename T&gt; bool max(T x, T y) { return x &gt; y ? x : y; }<br> #define max(x, y) x &gt; y ? x : y<br> <p> The macro version will evaluate one argument twice. Bad things happen when you do max(a, max(b, c)). Yes, that can be fixed by parenthesizing each macro argument, but I shouldn't need to deal with that kind of crap. The template version will complain when the types don't match, the macro won't.<br> Other people have tried to make it suck less by using macros to generate functions:<br> <a href="http://www.canonware.com/rb/">http://www.canonware.com/rb/</a><br> That sucks too, because I really don't feel that I should be responsible for the bookkeeping that approach requires. It also doesn't work if you need to deal with complex types like int(*)(void) or whatever. Yes, one can use a typedef in that case. No, that's not how it should be. Generic algorithms and data structures are important enough to warrant language features to support them. <br> <p> <font class="QuotedText">&gt; Whenever the many deficiencies of STL is pointed out, you defensively say that "they fixed that in boost::XXX",</font><br> Boost doesn't usually "fix" the STL, it simply provides solutions to other problems than the STL does. std::list (de)allocates memory for you but works with arbitrary types. boost.intrusive don't work with all types (because they need the pointers to be embedded within the type), but it's more flexible with regards to memory management.<br> <p> <font class="QuotedText">&gt; but ignore that any missing containers, string types and other omissions from the standard C library can be easily rectified through the use of a small set of additional libraries.</font><br> It can't, generic data structures suck when the language doesn't support them. And besides, it's not like one always needs boost. Often enough, STL containers and algorithms do the job just fine.<br> </div> Tue, 14 Aug 2012 19:56:36 +0000 A generic hash table https://lwn.net/Articles/511228/ https://lwn.net/Articles/511228/ sashal <div class="FormattedComment"> Indeed this is a great suggestion. Some sort of CONFIG_HASHTABLE_TEST (similar to what happens with list sorting for example) is definitely something I'd like to add.<br> <p> Thanks for the suggestion!<br> </div> Tue, 14 Aug 2012 17:02:28 +0000 A generic hash table https://lwn.net/Articles/511138/ https://lwn.net/Articles/511138/ Cyberax <div class="FormattedComment"> <font class="QuotedText">&gt; Hash maps were invented in the fifties. C++ has been a popular language since the eighties. The fact that they couldn't design a hash table implementation that was good enough to slap a sticker on it in 1998</font><br> Not in 1998 but in late 1995 (that's when the final drafts were produced). And the whole idea of STL was pretty new at that point, it's actually a small miracle they managed to put it into the C++ Standard as it is.<br> <p> <font class="QuotedText">&gt; Do you want an open addressing implementation? What type of probing? A Cuckoo hash? Do you use a linked list to allow ordered iteration? Do you allow hash table modification during iteration? If so, only using that iterator or arbitrary insertions? What resize strategy do you use when growing the table? What resize strategy do you use when shrinking the table? Do you store precalculated hash codes in the table? Do you use tombstones to mark deleted entries? </font><br> <p> You've described about 100 different structures (considering various permutations). Some of the functionality can be split into policies (they work wonderfully with templates, incidentally). Otherwise, library designers just pick a generic enough version and go along with it. <br> <p> If you need some special tuned version for your very specific need - you're welcome to write it. If you abide by STL's standards then you can even use them with STL/Boost algorithms. Without any loss of performance, I might add.<br> </div> Tue, 14 Aug 2012 00:04:01 +0000 A generic hash table https://lwn.net/Articles/511137/ https://lwn.net/Articles/511137/ liljencrantz <div class="FormattedComment"> The part where you say that it is impossible to write type-safe, generic and fast containers in C is demonstratably false. All of the above can be had with macros. I look forward to you arguing that such implementations aren't good while ignoring that most of the problems with macros also exist with templates.<br> <p> Whenever the many deficiencies of STL is pointed out, you defensively say that "they fixed that in boost::XXX", but ignore that any missing containers, string types and other omissions from the standard C library can be easily rectified through the use of a small set of additional libraries.<br> </div> Mon, 13 Aug 2012 23:55:38 +0000 A generic hash table https://lwn.net/Articles/511136/ https://lwn.net/Articles/511136/ HelloWorld <div class="FormattedComment"> <font class="QuotedText">&gt; I find it hilarious that I posted a s/Linux/C++/ of your own comment and you call me a troll.</font><br> That's because the arguments don't make sense any more when they're applied the other way around (and that's a good thing, as they'd be meaningless otherwise).<br> </div> Mon, 13 Aug 2012 23:50:29 +0000 A generic hash table https://lwn.net/Articles/511135/ https://lwn.net/Articles/511135/ liljencrantz <div class="FormattedComment"> I find it hilarious that I posted a s/Linux/C++/ of your own comment and you call me a troll. Do they have mirrors in your world?<br> </div> Mon, 13 Aug 2012 23:46:46 +0000 A generic hash table https://lwn.net/Articles/511132/ https://lwn.net/Articles/511132/ liljencrantz <div class="FormattedComment"> Hash maps were invented in the fifties. C++ has been a popular language since the eighties. The fact that they couldn't design a hash table implementation that was good enough to slap a sticker on it in 1998 (that's almost 20 years down the line) is a strong testament to just how hard the trade offs are when trying to design generic C++ data structures.<br> <p> Do you want an open addressing implementation? What type of probing? A Cuckoo hash? Do you use a linked list to allow ordered iteration? Do you allow hash table modification during iteration? If so, only using that iterator or arbitrary insertions? What resize strategy do you use when growing the table? What resize strategy do you use when shrinking the table? Do you store precalculated hash codes in the table? Do you use tombstones to mark deleted entries? <br> <p> No matter what you answer to the above questions, you will make your hash table nearly useless for a large swath of people. And you can try to make the answer to all those questions decidable at compile time through templates, but that solution comes with it's own set of painful drawbacks.<br> <p> You might also notice, that generic tree based red black trees have been present in the Linux kernel since forever.<br> </div> Mon, 13 Aug 2012 23:42:42 +0000 A generic hash table https://lwn.net/Articles/511106/ https://lwn.net/Articles/511106/ HelloWorld <div class="FormattedComment"> <font class="QuotedText">&gt; One has to wonder why the language is only now gettin (sic) such a basic data structure as a generic hash table</font><br> The reason is trivial: The committee ran out of time when C++98 was standardized, thus hash tables didn't get in. However, boost and various STL vendors have been shipping hash tables for a long time. And let's not talk about the fact that C11 doesn't ship any containers at all.<br> <p> <font class="QuotedText">&gt; &lt;pointless trolling removed&gt;</font><br> <p> <font class="QuotedText">&gt; Every insert into the C++ linked list causes a memory allocation.</font><br> One can specify an allocator if the default one isn't fast enough, and if that still doesn't cut it, one can use something like boost.intrusive, where one has full control over all memory (de)allocation. The STL isn't all things to all people. The point is that in C++ it's actually *possible* to write good (i. e. type-safe, generic, fast) containers, while it isn't in C. <br> <p> <font class="QuotedText">&gt; Until recently, LinkedList::size() was an O(n) operation.</font><br> Having size() be an O(1) operation requires the size to be stored in the list and updated when elements are added or removed. This means that one can either move an iterator range from one list to another in constant time, or have size() run in constant time, but not both, because if you need to update the list sizes, you need to count the elements in the range, and requires linear time. So it actually makes sense for size() to run in linear time depending on what you want to do.<br> <p> <font class="QuotedText">&gt; The attempts to make vector&lt;bool&gt; fast were a horrible failure.</font><br> Well, vector&lt;bool&gt; was a mistake, shit happens. It tends to be a minor problem in the real world. One can just use a std::vector&lt;uint8_t&gt; or std::bitset or boost::dynamic_bitset or whatever else does the job.<br> <p> <font class="QuotedText">&gt; The COW abilities of basic_string are mostly a joke</font><br> C11 not having a string type at all, that's what is a joke. <br> <p> <font class="QuotedText">&gt; &lt;yet more pointless trolling removed&gt;</font><br> </div> Mon, 13 Aug 2012 23:26:41 +0000 A generic hash table https://lwn.net/Articles/511121/ https://lwn.net/Articles/511121/ dlang <div class="FormattedComment"> <font class="QuotedText">&gt; C++11 is the first version of the language to feature a built in hash table. One has to wonder why the language is only now gettin (sic) such a basic data structure as a generic hash table.</font><br> <p> It's not that C++ is getting a "generic hash table"<br> <p> It's that the Linux Kernel project is implementing something they call a "generic hash table" that combines all the various features that the different hash tables in the kernel implement.<br> <p> C++ has a very generic hash table, it's also not very useful by itself, and as a result it does 80% of the work and you implement the remaining 20% to match your application.<br> <p> In the Linux Kernel, this 20% has been implemented a dozen different ways (working to solve a dozen different types of problems). This is a unified implementation that proposes to cover all dozen use cases with one set of code.<br> </div> Mon, 13 Aug 2012 22:39:03 +0000 A generic hash table https://lwn.net/Articles/511116/ https://lwn.net/Articles/511116/ Cyberax <div class="FormattedComment"> <font class="QuotedText">&gt;C++11 is the first version of the language to feature a built in hash table. One has to wonder why the language is only now gettin (sic) such a basic data structure as a generic hash table.</font><br> <p> That's because C++11 is the first major revision of C++ since 1998 (yeah, blame the ISO committee). Hash maps have been present in various non-standard ways in C++ stdlibs since 90-s. That's also the reason they are called 'unordered_maps' in the C++11 to avoid all kinds of name collisions.<br> <p> You might also notice, that tree based std::map has been present since C++98.<br> </div> Mon, 13 Aug 2012 22:18:25 +0000 A generic hash table https://lwn.net/Articles/511104/ https://lwn.net/Articles/511104/ liljencrantz <div class="FormattedComment"> C++11 is the first version of the language to feature a built in hash table. One has to wonder why the language is only now gettin (sic) such a basic data structure as a generic hash table. I think it's because generic data structures are insanely hard in a language like C++. It's not really possible to make them both generic and performant (at least not without hortrible template hackery), and making them customizable (so you can apply different policies regarding memory allocation, collision handling, etc.) is even harder.<br> <p> Yes, C is an ugly language and it has many warts. But it's *useful* in that it lets me write code in the way I want to write it: without hidden costs, reasonably close to the hardware and just as efficient as C++, if not more so.<br> <p> Seriously though, writing generic data structures in C++ is a hard enough problem that even stl is pretty crappy for a lot of applications. Every insert into the C++ linked list causes a memory allocation. Until recently, LinkedList::size() was an O(n) operation. The attempts to make vector&lt;bool&gt; fast were a horrible failure. The COW abilities of basic_string are mostly a joke because of weak C++ aliasing rules. C++ provides you with half the tools you need to write good generic data structures and all the rope you need to hang yourself while chasing that pipe dream.<br> <p> </div> Mon, 13 Aug 2012 21:40:07 +0000 A generic hash table https://lwn.net/Articles/510621/ https://lwn.net/Articles/510621/ nix <div class="FormattedComment"> Yeah. There are hash designs which don't have this problem -- e.g. any based on trees (you can even arrange that deletion of the key you're currently iterating over will only impose the same overhead on the next iteration as a normal hash lookup). Downside: trees often involve a lot of pointer chasing, and some of them are write-heavy on update and even on read (e.g. splay trees: nice self-optimizing data structure, shame about your cacheline contention).<br> </div> Fri, 10 Aug 2012 16:28:55 +0000 A generic hash table https://lwn.net/Articles/510522/ https://lwn.net/Articles/510522/ nybble41 <div class="FormattedComment"> <font class="QuotedText">&gt; Downside: you still do need to resize now and then, which is hateful because it breaks active iterators unless you have crazy complexity to deal with that case alone.</font><br> <p> Even without resizing, keys can change location when other items are inserted, so you can't make very many guarantees. Even with a normal hash table it's difficult to say whether you will see the inserted key later on; with a cuckoo hash table you may miss _other_ keys, or even double-iterate them.<br> <p> If you must modify a hash table with active iterators, perhaps you should collect the list of keys up front, atomically, and iterate over that instead. It may not be quite as efficient as iterating over the hash table directly, in memory or CPU time, but the result will be much more predictable. Or you could make a shallow copy of the hash table to iterate over.<br> </div> Fri, 10 Aug 2012 02:17:37 +0000 A generic hash table https://lwn.net/Articles/510505/ https://lwn.net/Articles/510505/ nix <div class="FormattedComment"> Ooo. I'd failed to notice cuckoo hashes: very nice (particularly with more than one item: 50% loading is rather cruddy, but 91% is close enough to 100% as makes no odds. Combine that with using three hash functions and you could get it *very* close to 100% for only a small constant slowdown. Downside: you still do need to resize now and then, which is hateful because it breaks active iterators unless you have crazy complexity to deal with that case alone. Yes, you *can* ban modification of hashes being iterated over, but I consider that a bad-quality implementation.)<br> <p> </div> Thu, 09 Aug 2012 23:26:29 +0000 A generic hash table https://lwn.net/Articles/510471/ https://lwn.net/Articles/510471/ HelloWorld <div class="FormattedComment"> One has to wonder why the kernel is only now gettin such a basic data structure as a generic hash table. I think it's because generic data structures aren't all that useful in a language like C. It's not really possible to make them both generic and typesafe (at least not without horrible macro hackery), and making them customizable (so you can apply different policies regarding memory allocation, collision handling etc.) is even harder.<br> Yes, C++ is an ugly language and it has many warts. But it's *useful* in that it lets me write code in the way I want to write it: generic, type-safe, flexible and just as efficient as C, if not more so.<br> </div> Thu, 09 Aug 2012 20:17:19 +0000 A generic hash table https://lwn.net/Articles/510421/ https://lwn.net/Articles/510421/ pj <div class="FormattedComment"> Coming right after the 'kernel regressions' article, this made me wonder... does this new generic hash table implementation come with a test suite? If so, that would be a measurable, notable improvement over the other N implementations that Sasha could point to.<br> </div> Thu, 09 Aug 2012 14:32:57 +0000 A generic hash table https://lwn.net/Articles/510344/ https://lwn.net/Articles/510344/ alonz Re the pointer-chasing vs. simple table argument&mdash;has anyone tried to modify the hash tables to implement <a href="http://en.wikipedia.org/wiki/Cuckoo_hashing">Cuckoo hash</a> behavior? From my experience these provide nice gains over a simple hash table, and with zero added complexity. Thu, 09 Aug 2012 05:02:27 +0000