Ok, probably the search algorithm is ok...
For the hash function, SHA1 has almost 0 probability of collision and gives a 160 bits hash; to avoid collisions between around 5000 entries, it's really overkill. Actually a good hash function for non-cryptographic purposes (assuming you go compare entries with the same hash) is of the followind kind:
int hash(char a[], int size, int prime) { char * p = a; int sum = 0; while (*p) { sum = sum * prime; sum = (sum + *p++) % size; //printf("%d ", sum); } return sum; }
it returns an integer in the range [0..size); (and prime must be a prime number, which usually you choose at compile time). prime=131 is a good choice. It is easy to add this code to build a testing program with the below (send size and prime on the first line, then a test file, and then sort|uniq -c|
count the average number of collisions on used hash codes with awk).
char str[1000]; int main(void) { int size, prime; scanf("%d %d", &size, & prime); while (scanf("%s", str) != EOF) { printf("%d\n", hash(str, size, prime)); } }
Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds