Algorithms for loading history file
Posted Jun 1, 2005 16:15 UTC (Wed) by
Blaisorblade (guest, #25465)
In reply to:
Algorithms for loading history file by liljencrantz
Parent article:
Fish - The friendly interactive shell
Ok, probably the search algorithm is ok...
For the hash function, SHA1 has almost 0 probability of collision and gives a 160 bits hash; to avoid collisions between around 5000 entries, it's really overkill. Actually a good hash function for non-cryptographic purposes (assuming you go compare entries with the same hash) is of the followind kind:
int hash(char a[], int size, int prime) {
char * p = a;
int sum = 0;
while (*p) {
sum = sum * prime;
sum = (sum + *p++) % size;
//printf("%d ", sum);
}
return sum;
}
it returns an integer in the range [0..size); (and prime must be a prime number, which usually you choose at compile time). prime=131 is a good choice. It is easy to add this code to build a testing program with the below (send size and prime on the first line, then a test file, and then sort|uniq -c|count the average number of collisions on used hash codes with awk).
char str[1000];
int main(void) {
int size, prime;
scanf("%d %d", &size, & prime);
while (scanf("%s", str) != EOF) {
printf("%d\n", hash(str, size, prime));
}
}
(
Log in to post comments)