LWN.net Logo

Opportunistic Garbage Collection Obviousness

Opportunistic Garbage Collection Obviousness

Posted Apr 22, 2011 6:10 UTC (Fri) by orcmid (guest, #74478)
In reply to: Opportunistic Garbage Collection Obviousness by orcmid
Parent article: Google Linux servers hit with $5m patent infringement verdict (The Register)

I did go look at Berkeley DB because it does use a hashing scheme. What is revealed there is that hashing does not support indexed-sequential operation, but the B+Tree access method does. (This gets determined when you build the DB.)

The hashing method is extended linear hashing, where the hash function is adjusted dynamically to allow growing of the hash table. Linear hashing is mentioned on pp.548-549 (just before the exercises of section 6.4) of Don Knuth's "Art of Computer Programming," vol.3 Sorting and Searching, second edition. There is not enough information there to know how collisions on the same hash are dealt with.


(Log in to post comments)

Opportunistic Garbage Collection Obviousness

Posted Apr 22, 2011 18:32 UTC (Fri) by Wol (guest, #4433) [Link]

In linear hashing (typically used in Pick-based systems) it's normal to have a bucket that can hold multiple records. Records with the same hash are chained in a bucket, and should the bucket overflow, the records are often chained on the end of the file. So let's say your "modulo" is 20 - buckets 0-19 - the first bucket to overflow chains into 20, then next into 21, etc.

This then causes a bit of grief when the modulo increments, because you have to shove bucket 20 out of the way to create a new bucket 20 as non-overflow.

At least one implementation (Prime Information) used a special file/directory type, so each SEGSAM file contained subfiles, one per bucket, and they could grow as big as required.

Cheers,
Wol

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds