|
|

Posted Jun 1, 2005 16:15 UTC (Wed) by Blaisorblade (guest, #25465)
Parent article: Fish - The friendly interactive shell

Ok, probably the search algorithm is ok...
For the hash function, SHA1 has almost 0 probability of collision and gives a 160 bits hash; to avoid collisions between around 5000 entries, it's really overkill. Actually a good hash function for non-cryptographic purposes (assuming you go compare entries with the same hash) is of the followind kind:

```int hash(char a[], int size, int prime) {
char * p = a;
int sum = 0;
while (*p) {
sum = sum * prime;
sum = (sum + *p++) % size;
//printf("%d ", sum);
}
return sum;
}
```

it returns an integer in the range [0..size); (and prime must be a prime number, which usually you choose at compile time). prime=131 is a good choice. It is easy to add this code to build a testing program with the below (send size and prime on the first line, then a test file, and then `sort|uniq -c|`count the average number of collisions on used hash codes with awk).

```char str[1000];

int main(void) {
int size, prime;
scanf("%d %d", &size, & prime);
while (scanf("%s", str) != EOF) {
printf("%d\n", hash(str, size, prime));
}
}
```