User: Password:
|
|
Subscribe / Log in / New account

Algorithms for loading history file

Algorithms for loading history file

Posted Jun 1, 2005 11:11 UTC (Wed) by liljencrantz (guest, #28458)
In reply to: Algorithms for loading history file by Blaisorblade
Parent article: Fish - The friendly interactive shell

The history search uses a linear search right now, since the history list is so small that searches take virtually no time at all. Making the search use a more clever algorithm would just increase code bloat without adding any noticable benefit to the user.

The thing that _does_ take time is reading the history file from disk. I haven't looked in to whether this is because of disk latency, because fish does mutiple mallocs for every entry, meaning several thousand mallocs are performed when loading into memory or if this is caused by using a slow hash function in the table used for removing duplicates. The hash function fish uses is pretty slow, since it does quite a lot of work. It is designed in much the same way as the SHA family of cryptographic functions, but with fewer laps and a smaller data set. The upside is that the distribution of the hash values is very good. The downside is that the hash function is a bit slow.


(Log in to post comments)

Algorithms for loading history file

Posted Jun 1, 2005 16:15 UTC (Wed) by Blaisorblade (guest, #25465) [Link]

Ok, probably the search algorithm is ok...
For the hash function, SHA1 has almost 0 probability of collision and gives a 160 bits hash; to avoid collisions between around 5000 entries, it's really overkill. Actually a good hash function for non-cryptographic purposes (assuming you go compare entries with the same hash) is of the followind kind:

int hash(char a[], int size, int prime) {
  char * p = a;
  int sum = 0;
  while (*p) {
    sum = sum * prime;
    sum = (sum + *p++) % size;
    //printf("%d ", sum);
  }
  return sum;
}

it returns an integer in the range [0..size); (and prime must be a prime number, which usually you choose at compile time). prime=131 is a good choice. It is easy to add this code to build a testing program with the below (send size and prime on the first line, then a test file, and then sort|uniq -c|count the average number of collisions on used hash codes with awk).

char str[1000];

int main(void) {
  int size, prime;
  scanf("%d %d", &size, & prime);
  while (scanf("%s", str) != EOF) {
    printf("%d\n", hash(str, size, prime));
  }
}


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds