Path: utzoo!utgpu!news-server.csri.toronto.edu!cs.utexas.edu!swrinde!zaphod.mps.ohio-state.edu!rpi!uupsi!sunic!news.funet.fi!hydra!cc.helsinki.fi!wirzenius From: wirzenius@cc.helsinki.fi (Lars Wirzenius) Newsgroups: comp.lang.c Subject: Re: Efficient STRing CoMPares? Message-ID: <1991Mar16.211154.5568@cc.helsinki.fi> Date: 16 Mar 91 21:11:54 GMT References: <1991Mar15.142821@mars.mpr.ca> Organization: University of Helsinki Lines: 53 In article <1991Mar15.142821@mars.mpr.ca>, stone@mars.mpr.ca (Darren Stone) writes: > Does anyone know if there's a better (!?) way to compare > strings than with strcmp()? > I'm looking for any speed advantage I can get! For non-portability, the fastest way would probably be to write inline assembly code using the machine's string compare instruction. Of course, this instruction doesn't exist in all machines, but if it does, it's probably quite fast. Another way (hopefully somewhat more portable) to go is to be clever, and to avoid string comparisons by modifying the algorithm so they aren't needed (or are needed less frequently). After that you may be able to improve on the comparison method. One way is to avoid calling strcmp if the first characters differ: #define STRCMP(s,t) (*(s) == *(t) ? strcmp((s), (t)) : *(s) - *(t)) (or something along those lines, this is untested). If your strings are mostly going to differ in the first few bytes, you could pre-compute a special value based on those bytes, and compare it first: #define BITS_IN_CHAR 8 struct string { char *str; long value; }; void compute_value(struct string *s) { int i; s->value = 0; for (i = 0; s->str[i] != '\0' && i < sizeof(long); ++i) { s->value <<= BITS_IN_CHAR; s->value |= s->str[i]; } if (i < sizeof(long)) s->value <<= BITS_IN_CHAR * (sizeof(long) - i); } (The above could be optimized, but you get the idea.) If the pre-computed values differ, you know the strings differ in the first sizeof(long) bytes, so there's no need to call strcmp. Furthermore, if you are careful to arrange the first character as the most significant byte of the value (and so on), you can even find out which string (according to the byte values, no locale collation sequence here) is greater. Of course, if you need the locale collation sequence, you have to use strcmp. -- Lars Wirzenius wirzenius@cc.helsinki.fi